In this article, we will build a basic news search engine that is capable of finding news by keywords. Since this is a complex system, I will first split the system up into smaller modules. The first module is the module that retrieves all news from the internet. This module is called a scraper (or web scraper) and is written in Python. It maintains a file called the index. This is a file that contains a list of documents per keywords. For example, several documents contain the term “music”, so the index contains the term “music” and a list with references to all documents that contain the word “music”. But we will first start with our scraper.
It is 2016 and therefore Facebook started its Facebook Hacker Cup 2016. It has some challenging exercises which you need to solve in a few hours. But you also need to write efficient code. If you don’t do that, you will not have enough time to solve an exercise and points will be lost. In this article, I will highlight the “High Security” exercise in the Facebook Hacker Cup qualification round. I will first give the problem description which was posted on Facebook and then tell how I tackled the problem.