Searching on the web.

The Mathematics of Search Engines

In this short post, I will give an overview of some technology that is used by well-known search engines.

Chances are high that you found this webpage by using a search engine. Think about that (and imagine the following if you ended up here by other means). A couple of seconds ago, you thought of some keywords, you entered them in a search engine and you clicked a link. The search engine provided you this webpage in just a few milliseconds! What is the mechanism that enabled such a high speed?

For the described search task, the search engine uses techniques developed in a field called Information Retrieval. Now imagine you are typing in the words “Leonardo da Vinci” in your favorite search engine. Many search engines display information about the painter (like his birth and death) but also information about related concepts. One¬†core concept¬†you find in almost any search engine nowadays, is a so-called Knowledge Graph. In this graph, entities like “Leonardo da Vinci”, his famous “Mona Lisa” and other artists like “Vincent van Gogh” are interconnected. Entities are the smallest possible things that can not be divided in smaller parts. Each of these entities has attributes (like the date of birth and a name) but these attributes are different for different entities. For example, painters have a date of birth. But buildings like “The White House” probably does not have a date of birth. Furthermore, these entities also have types. “Vincent van Gogh” and “Leonardo da Vinci” both have a type “Person”, but “The White House” does not belong to that type. In summary, a Knowledge Graph consists of nodes (entities) and arcs (relations between entities) and the entities have attributes and types.

It is even possible to use different representation schemes in which less or more features are used for entities. One very simplistic model only contains entities and relations among the entities. Here, attributes and types are discarded.

In the next weeks, I will post more information about Knowledge Graphs and how these graphs can be utilized for practical applications. If you have any questions or if you would like more information about some topic related to Knowledge Graphs, please let me know!

Kevin Jacobs

Kevin Jacobs

Kevin Jacobs is a certified Data Scientist and blog writer for Data Blogger. He is passionate about any project that involves large amounts of data and statistical data analysis. Kevin can be reached using Twitter (@kmjjacobs), LinkedIn or via e-mail: mail@kevinjacobs.nl.