Google completed its Caffeine web indexing system in 2010, which allows crawling and indexing the web for fresh content quickly on an enormous scale. Drawing upon the Caffeine infrastructure, Google has made what it calls a significant improvement to search ranking algorithm.
Google announced on 3 November 2011 a new search algorithm change which rewards ‘fresh content’. Google claims to have rolled out a major enhancement to its ranking algorithm designed to yield more up-to-date relevant results for the following types of searches dependent on freshness :
1. Breaking news or trending topics : when you search for current events like ‘confidence vote in Greece parliament’ you want to see up to the minute news, not a commentary which is days old. With this algorithm change, you should see more high-quality and freshest pages.
2. recurring events : Events like Olympics, elections to parliament and conferences take place on recurring basis. When you search for ‘election results’, you are really looking for latest update to the elections that just took place, not the results of elections held 10 years ago.
3. When most up to date content is relevant : information related to products or technology, places, politicians, and the like change frequently. When you search for ‘best compact digital cameras’, obviously you are looking for the latest cameras.
Google maintains that the search algorithm update will impact about 35% of searches.
Caffeine indexing infrastructure
Google’s crawling and indexing systems worked as batch processes earlier . Googlebot would crawl a set of pages, then process those pages (extracting content from them, associating data about them, such as anchor text and external links, determining what those pages were about), and finally add them to the index. While this system was continuous, all the documents in the batch had to wait until the whole batch was processed to be pushed live.
Now with Caffeine new web indexing system having gone live, when Google crawls a page, it processes that page through the entire indexing pipeline and pushes it live nearly instantly. This change has resulted in a 50 percent fresher index than before.
The change does not necessarily mean that search will be improved by 35% or any such extent. Search has always been subject to spam and manipulation, most notoriously by content farms. Sites which scrape content sometimes rank higher than the original page. Google’s panda update was one of the major attempts by the search major to battle spam.
Webmasters will try to tweak pages a little bit and post it again, hoping to be recognized as ‘fresh’.
Google has not revealed how “freshness” is being determined. But it is certain that quantum of change and time would not be the only factors. The topicality and quality of content will also be factored in.
Latest tweets will still be missing from the updated search. Google’s agreement with Twitter to carry its results expired on July 2, 2011. Since then Google does not have access to the special feed from Twitter; without this firehose of data it cannot index tweets fast enough to bring up latest tweets.
How to leverage the Google update rather than get penalized
If you do not keep your website updated regularly, your site will be negatively impacted. To take advantage of the Google update rather than become its victim, webmasters need to keep their pages updated regularly with some of the following :
- press releases or company announcements
- changes in management
- corporate events like sales meets and user group conferences
- product enhancements, new releases
- news from industry
SEO techniques like using titles, keywords, headings to advantage and ‘snippets’ will doubtless remain as relevant as before.