Are you looking for recent and old news about companies? We just created a Company News Search Engine for you! This might be useful when you want to keep track of the latest news about a particular company and also want keep track of the old news. This service is especially interesting for investors. So what is the difference with Google News then? Besides a normal search engine, we also offer an API which allows you to programmatically access the news.
In this article, we will build a basic news search engine that is capable of finding news by keywords. Since this is a complex system, I will first split the system up into smaller modules. The first module is the module that retrieves all news from the internet. This module is called a scraper (or web scraper) and is written in Python. It maintains a file called the index. This is a file that contains a list of documents per keywords. For example, several documents contain the term “music”, so the index contains the term “music” and a list with references to all documents that contain the word “music”. But we will first start with our scraper.