By Mark Rosch
One of the questions that we are often asked during our live MCLE seminars and MCLE webinars is whether there is a keyword search available for the billions of Web pages collected by the Archive.org's Wayback Machine. Since its inception in 2000, Wayback Machine users have been able to browse earlier versions of websites by entering the URL for a known site into the main search box and clicking on Browse History.
Even though the Archive.org has attempted to develop a keyword for its collection search several times, none ever worked well-enough to become a permanent addition. So, the answer has always been, "No, there is no keyword search of the Web pages stored in the Internet Archive's Wayback Machine," ...until now...sort of. Archive.org has recently added a new keyword search through a portion of its collection.
The Internet Archive announced the keyword search feature in a blog post that included the following:
With this new beta search service, users will now be able to find the home pages of over 361 Million websites preserved in the Wayback Machine just by typing in keywords that describe these sites (e.g. “new york times”). As they type keywords into the search box, they will be presented with a list of relevant archived websites with snippets containing:
- a link to the archived versions of the site’s home page in the Wayback Machine
- a thumbnail image of the site’s homepage (when available)
- a short description of the site’s homepage
- a capture summary of the site
- number of unique URLs by content type (webpage, image, audio, video)
- number of valid web captures over the associated time period
Key Features
- Search as you type
- Instant results as you type — predictive, interactive and speedy
- Multilingual
- Search in any language or using symbols — expanding scope and utility
- Site-based Filtering
- Limit results to certain websites or domains using the site: operator (e.g. site:edu)
One important limitation of this new search is that it searches ONLY text on the home pages of the sites and NOT all of the pages of sites contained in the Wayback Machine.
It's interesting to note that while the blog post refers to the site's ability to keyword search "the home pages of over 361 Million websites," the main search box currently offers visitors the ability to "Search the history of over 279 billion web pages on the Internet" using a specific URL. Prior to the introduction of the new home page keyword search, the site previously boasted nearly 475 billion Web pages in its collection. While the difference in the number of browseable historic pages may be explained by the distinction Archive.org makes between "Web pages, Web sites, and Web captures," we are awaiting a response from the organization to confirm.
THE LATEST INTERNET RESEARCH TIPS
Read the latest strategies, tips and new resources available for integrating the Internet into your law practice in our newsletter.