Building a news search engine

News papers.

In this article, we will build a basic news search engine that is capable of finding news by keywords. Since this is a complex system, I will first split the system up into smaller modules. The first module is the module that retrieves all news from the internet. This module is called a scraper (or web scraper) and is written in Python. It maintains a file called the index. This is a file that contains a list of documents per keywords. For example, several documents contain the term “music”, so the index contains the term “music” and a list with references to all documents that contain the word “music”. But we will first start with our scraper.

(more…)

Read more · 24 minutes
Data Blogger Courses

Mastering Pandas

In this course, you will learn how to use the Python Pandas. After the course, you will be able to:

  • Load and transform your data
  • Visualizing data using line plots, scatter plots and histograms
  • Merging and storing data

The course also includes more advanced topics, such as data parallelization and aggregation.

You can see all course content under “Curriculum” on Data Blogger Courses and the first three lessons are free. The first free lesson can be found here.

(more…) Read more

Facebook Hacker Cup 2016: High Security

Facebook Hacker Cup

It is 2016 and therefore Facebook started its Facebook Hacker Cup 2016. It has some challenging exercises which you need to solve in a few hours. But you also need to write efficient code. If you don’t do that, you will not have enough time to solve an exercise and points will be lost. In this article, I will highlight the “High Security” exercise in the Facebook Hacker Cup qualification round. I will first give the problem description which was posted on Facebook and then tell how I tackled the problem.

(more…)

Read more · 9 minutes

The Mathematics Behind: Polynomial Curve Fitting (MATLAB)

In the series “The Mathematics Behind” I will explain mathematical concepts behind commonly used technologies. In this post, I will explain the mathematics behind polynomial curve fitting MATLAB.

First of all, what is polynomial curve fitting and where is it used for? Suppose we are trading on a stock market. The stock price is going up and down (see the figure) and we want to discover patterns in the price chances if any exists. Polynomial curve fitting tries to fit a model (here: a polynomial) on the given datapoints as good as possible.

(more…)

Read more · 13 minutes