Indexing the past and present

With the shutdown of GeoCities Japan, we are reaching an important point in the history of the Internet where important historical information is vanishing while being replaced with new information that is hidden away as small snippets of information in social media systems.

It is becoming increasingly apparent that a vast trove of information is simply missing from Google Search. Aggressively pushing for well-ranked sites, user-made sites with obscure but useful information are not as indexed, and their lack of maintenance leads to their loss forever.

For instance, I was only able to find MIDI versions of Pokemon Ruby and Sapphire music from a site hosted by Comcast. After the shutdown of Comcast personal sites, the information was lost to indexing forever and hidden away in the Internet Archive.

What I propose is the indexing and ranking of content in the Internet Archive and social media networks to make a powerful search engine capable of searching, past, present, and realtime data.

A large fault of the Google Search product over the years has been its dumbing down of information during the aggregation process of the Knowledge Engine that inhibits the usefulness of complex queries. If a query is too complex (i.e. contains keywords that are too far apart from each other), Google Search will attempt to ignore some keywords to fit the data that it has indexed, which only fits into particular categories or keywords. If the whole complex query is forced, though, Google Search will be unable to come up with results because it does not index or rank webpages in a way that is optimized for complex queries – not because the information does not exist.

The corpus of information is also diversifying: there is more information in e-books, chat logs, and Facebook conversations than can be found simply by crawling the hypertext. But the Google search engine has not matched this diversification, opting simply to develop the Knowledge Graph to become a primary and secondary source of information.

I think this would be a great direction a search engine such as DuckDuckGo could take to compete more directly with Google Search in a dimension other than privacy. After all, Google Search is no longer Google’s main product.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.