Building a news search engine

News papers

In this article, we will build a basic news search engine that is capable of finding news by keywords. Since this is a complex system, I will first split the system up into smaller modules. The first module is the module that retrieves all news from the internet. This module is called a scraper (or web scraper) and is written in Python. It maintains a file called the index. This is a file that contains a list of documents per keywords. For example, several documents contain the term “music”, so the index contains the term “music” and a list with references to all documents that contain the word “music”. But we will first start with our scraper.

(more…)

Read More