Simple Wiki Index
271222 pages indexed
This project has been created to play around with the Wikipedia dataset. The goal is to calculate the distance between two pages, that is, how many clicks are required to get from one page to the other by only using the links on each page.
Currently only the Simple Wikipedia has been indexed. The reason for this is that the entire index (containing page title + references) is stored in memory. Now, to index the English Wikipedia either more memory is required or the data needs to be partially stored on disk (or the memory footprint reduced with another way).
You can find the source code on Github
Most referenced page
The page United States has been referenced 21632 times.
Least referenced page
The page Danny Gruen has been referenced 0 times.
What can you do here?
Page Distance
On this page you can get the distance between two pages. That is, how many clicks are required to get from one page to the other, using only links on the pages.
Search for Pages
Here you can search for all indexed pages. When you click on a page you can see from which pages it is referenced by and to which pages it has references.