Simple Wiki Index

271222 pages indexed

This project has been created to play around with the Wikipedia dataset. The goal is to calculate the distance between two pages, that is, how many clicks are required to get from one page to the other by only using the links on each page.

Currently only the Simple Wikipedia has been indexed. The reason for this is that the entire index (containing page title + references) is stored in memory. Now, to index the English Wikipedia either more memory is required or the data needs to be partially stored on disk (or the memory footprint reduced with another way).


You can find the source code on Github

Most referenced page

The page United States has been referenced 21632 times.

Least referenced page

The page Mitlodi has been referenced 0 times.

What can you do here?

Page Distance

On this page you can get the distance between two pages. That is, how many clicks are required to get from one page to the other, using only links on the pages.

Search for Pages

Here you can search for all indexed pages. When you click on a page you can see from which pages it is referenced by and to which pages it has references.