Google's Head of Webspam, Matt Cutts, has posted a video to the Google Webmaster Youtube Channel explaining what he's been saying in private and on conference platform for years - that SEO per se is not spam; ethical SEO as practiced and long advocated by me (so much so that I worked with Matt when he was putting together the original Google Webmaster Guidelines a decade ago this month) is certainly not spam; but that some forms of SEO, in particular black hat SEO, are spam. The video is below and, for those of you without video playing capabilities, a transcript prepared by Lynda follows:
Transcript of "Does Google Consider SEO to be Spam?" By Matt Cutts
I wanted to take a minute and talk a little bit about search engine optimization and spam, and answer the question “Does Google consider SEO to be spam?”
And the answer in “No. We don’t consider SEO to be spam.” Now a few really tech savvy people might get angry at that. So let me explain in a little more detail. SEO stands for Search Engine Optimization. And essentially it just means trying to make sure that your pages are well represented within search engines. And there’s plenty...an enormous amount ...of white hat, great quality stuff that you can do as a search engine optimizer.
You can do things like making sure that your pages are crawlable. So you want them to be accessible. You want people to be able to find them just by clicking on links. And in the same way search engines can find them just by clicking on links.
You want to make sure that people use the right keywords. If you’re using industry jargon or lingo that not everybody else uses, then a good SEO can help you find out, oh, these are keywords that you should have been thinking about.
You can think about usability, and trying to make sure that the design of the site is good. That’s good for users and for search engines.
You can think about how to make your site faster. Not only does Google use site speed in our rankings as one of the many factors that we use in our search rankings. But if you can make your site run faster, that can also make it a much better experience.
So there are an enormous number of things that SEOs do, everything from helping out with the initial site architecture and deciding what your site should look like, and the url structure, and the templates, and all that sort of stuff, making sure that your site is crawlable, all the way down to helping optimize for your return on investment. So trying to figure out what are the ways that you are going to get the best bang for the buck, doing AB testing, trying to find out, OK, what is the copy that converts, all those kinds of things. There is nothing at all wrong with all of those white hat methods.
Now, are there some SEOs who go further than we would like? Sure. And are there some SEOs who actually try to employ black hat techniques, people that hack sites or that keyword stuff and just repeat things or that do sneaky things with redirects? Yeah, absolutely. But our goal is to make sure that we return the best possible search results we can. And a very wonderful way that search engine optimizers can help is by cooperating and trying to help search engines find pages better. So SEO is not spam. SEO can be enormously useful. SEO can also be abused and it can be overdone.
But it’s important to realise that we believe, in an ideal world, people wouldn’t have to worry about these issues. But search engines are not as smart as people yet. We’re working on it. We’re trying to figure out what people mean. We’re trying to figure out synonyms, and vocabulary, and stemming so that you don’t have to know exactly the right word to search for what you wanted to find. But until we get to that day, search engine optimization can be a valid way to help people find what they are looking for via search engines.
We provide webmaster guidelines on google.com/webmasters. There’s a free webmaster forum. There are free webmaster tools. There’s a ton of HTML documentation. So if you search for SEO starter guide, we’ve written a beginner guide where people can learn more about search engine optimization.
But just to be very clear, there are many, many valid ways that people can make the world better with SEO. It’s not the case that...sometimes you’ll hear SEOs are criminals. SEOs are snake oil salesmen. If you find a good person, someone that you can trust, someone that will tell you exactly what they’re doing, the sort of person where you get good references, or you’ve seen their work and it’s very helpful, and they’ll explain exactly what they’re doing, they can absolutely help your website. So I just wanted to dispel that misconception.
Some people think Google thinks all SEO is spam and that’s definitely not the case. There are a lot of great SEOs out there. And I hope you find a good one to help with your website.
Google has announced that it is to cease providing referrer information in some instances. In the official blog post, Google's Evelyn Kao writes:
When you search from https://www.google.com, websites you visit from our organic search listings will still know that you came from Google, but won't receive information about each individual query.
Initially this change affects people logged in to Google accounts and using Google.com which, Google claims, is a very small percentage of searchers (although still a large number of people). But it's likely this will change as, according to Google's own blog entry:
As we continue to add more support for SSL across our products and services, we hope to see similar action from other websites.
To give an example of what Google have actually done, I have searched today for "car insurance" both logged in to my Google account and searching on https://www.google.com, and not logged in to my Google account and searching on http://www.google.com/. In each case I have clicked through to the same landing page. Here are the referrers of that landing page in both cases:
Referrer When Not Logged In, Clicking On A Natural Link: http://www.google.com/#hl=en&sugexp=kjrmc&cp=5&gs_id=l&xhr=t&q=car+insurance&qe=Y2FyIGk&qesig=Eeu3hebYxgo0in9YDLhtAA&pkc=AFgZ2tkKH3Xw88yrwvHzg5MkB-5vAi8dBrAzxf3se4-a7_BaiiecMyYZt0D_3TtcaX8K2jJgbEC3Yw7qMsDB65pNgSjYWjDjlA&pf=p&sclient=psy-ab&source=hp&pbx=1&oq=car+i&aq=0p&aqi=p-p1g3&aql=f&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=8e7fa2636e8b849&biw=1680&bih=947
Referrer When Logged In, Clicking On A Natural Link: http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&sqi=2&ved=0CHoQFjAA&url=http%3A%2F%2Fwww.moneysupermarket.com%2Fcar-insurance%2F&ei=RoaeTuS6IY_D8QOP8fixCQ&usg=AFQjCNF-UvvfJsjMbuyeGwVVnyzkQmInRA&sig2=NaTkFkfi3cK7R1_V5TsTcg
I have highlighted the key difference in bold red above. When not logged in, my query "car insurance" is available in the referrer for Analytics to pick up and use to provide the site owner with information about what I was looking for. When logged in, my query "car insurance" has been stripped, so the site owner is completely clueless about why Google sent me to that page on their website. Note, then, that when logged in, that referrer is a lie - the page I was visiting before was not the one in the referrer at all. For example, I was actually on https://www.google.com/, not http://www.google.com/.
This small change has some very large consequences for site owners. For example, no matter what analytics package you use, any reports that show keywords will become less useful (and, at the extreme, useless). Check out this short interview from Google Analytics Evangelist Avinash Kaushik, following his keynote at 2010's Search Engine Strategies (which I attended with interest):
If Google removes keywords from referrer data then all of the great keyword ideas, keyword techniques and keyword attribution models that Avinash shares are no longer possible. Evangelise that, Avinash!
Joking aside, a lot of the great work SilverDisc and others do in making sites better for users will be made more difficult and less effective by this move.
Google's move upsets the ethical balance that exists between searchers, search engines and site owners. This is the very principle that ethical SEO is based upon - the three stakeholders to be considered are
Site owners who produce great content designed to meet their visitors' needs.
Search engines who are allowed to crawl and index that content as long as it provides benefit to the site owner.
Searchers who get to find the information they need in order to satisfy their enquiry.
From my original ethical SEO paper, the most ethical technique
produces the most good and does the least harm
respects the rights and dignity of all stakeholders and treats all stakeholders fairly
promotes the common good
helps all participate more fully in the goods we share as a community and a society
enables the deepening or development of those virtues or character traits that we value as individuals, professions and members of a society
How does Google removing referrer information produce an unethical result? Let's break it down:
produces the most good and does the least harm?
site owners can no longer optimise their sites to better match the searcher needs, so they will struggle to produce the best possible websites
respects the rights and dignity of all stakeholders and treats all stakeholders fairly?
site owners, rather than being treated with dignity, are treated as being "not trustworthy" and are denied a piece of information that the other two stakeholders (Google and the searcher) both have - the search query that resulted in that searcher visiting their site.
promotes the common good?
the common good is Google working with site owners to produce a better Web, which to be fair does happen a lot in other ways. This move, however, does not promote the common good - Google gains and the site owner loses.
helps all participate more fully in the goods we share as a community and a society?
clearly this move prevents full participation of site owners in something they have had available to them since the earliest days of the Web and something upon which the Web was built - referrer data was provided in the HTTP 0.9 specification and has been there ever since
enables the deepening or development of those virtues or character traits that we value as individuals, professions and members of a society?
again, this move alienates site owners and does not engender a spirit of cooperation and teamwork among site owners and Google, whose entire service is built on the content that site owners freely provide
What's really evil about Google's announcement is the patronising spin they've put on it. Google's headline, even on its Analytics blog which is aimed at site owners rather than searchers, is not "We're removing site owners' ability to pull keywords from the referrer"; it is "Making search more secure: Accessing search query data in Google Analytics". This fails to treat site owners with the respect they deserve. The whole piece is positioned as making search more secure, for example when using insecure Wifi hotspots, yet at least a couple of things don't stack up if this is the objective:
If the user is visiting a secure Web site then Google still strips the referrer (thanks Danny Sullivan at Search Engine Land for this info), even though this is not necessary and, given they don't do this on their Encrypted Search, Google clearly knows it's not necessary
Searchers' referrers still contain keywords if searchers click on an ad, rather than a natural result.
That last point really shows where Google's mind is at. To juxtapose a couple of points from their blog post:
we recognize the growing importance of protecting the personalized search results we deliver. As a result, we’re enhancing our default search experience for signed-in users ... [but] ... if you choose to click on an ad appearing on our search results page, your browser will continue to send the relevant query over the network to enable advertisers to measure the effectiveness of their campaigns and to improve the ads and offers they present to you
So advertisers who pay Google money get treated one way, site owners who pay Google by providing the content the whole Google service is built on get treated a different way, and searchers' privacy is not really protected. Nice. To complete the example I gave earlier, the third link below is the referrer I received on the same website as result 2, but this time clicking on a paid ad rather than a natural result:
Referrer When Not Logged In, Clicking On A Natural Link: http://www.google.com/#hl=en&sugexp=kjrmc&cp=5&gs_id=l&xhr=t&q=car+insurance&qe=Y2FyIGk&qesig=Eeu3hebYxgo0in9YDLhtAA&pkc=AFgZ2tkKH3Xw88yrwvHzg5MkB-5vAi8dBrAzxf3se4-a7_BaiiecMyYZt0D_3TtcaX8K2jJgbEC3Yw7qMsDB65pNgSjYWjDjlA&pf=p&sclient=psy-ab&source=hp&pbx=1&oq=car+i&aq=0p&aqi=p-p1g3&aql=f&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=8e7fa2636e8b849&biw=1680&bih=947
Referrer When Logged In, Clicking On A Natural Link: http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&sqi=2&ved=0CHoQFjAA&url=http%3A%2F%2Fwww.moneysupermarket.com%2Fcar-insurance%2F&ei=RoaeTuS6IY_D8QOP8fixCQ&usg=AFQjCNF-UvvfJsjMbuyeGwVVnyzkQmInRA&sig2=NaTkFkfi3cK7R1_V5TsTcg
Referrer When Logged In, Clicking On A Paid Link: http://www.google.com/url?http://www.google.com/aclk?sa=L&ai=ClrXYRoaeTt7BI8We8APK543yBYrGqWP-obrkI4TN7AQQBigIUOvGl8f4_____wFgu76ug9AKyAEBqQIwM2eSHF26PqoEGk_QTdQtj7np_5xRJavQhhGPHLhFRZtF9pdvugUTCOT1h6Om9KsCFY9hfAodjzg-lsoFAA&num=9&ei=RoaeTuS6IY_D8QOP8fixCQ&sig=AOD64_10PBtIEOuf9waR5LMaPUiMrDinMA&sqi=2&ved=0CEIQ0Qw&adurl=http://pixel.everesttech.net/1816/cq%3Fev_sid%3D3%26ev_ln%3Dcar%2520insurance%26ev_crx%3D9512830502%26ev_mt%3De%26ev_n%3Dg%26ev_ltx%3D%26ev_pl%3D%26url%3Dhttp%253A//www.moneysupermarket.com/car-insurance/insurance/%253FSource%253DGOO-003881E4%2526keywords%253Dcar%252Binsurance%252B%252BExact%2526p%253D0&rct=j&q=car+insurance
What can site owners do about this? Individually, not a lot. Promoting and using other search engines, such as Microsoft Bing, would be a start. This strikes me as a great opportunity for Microsoft to build and foster better relationships with site owners, for example by promising never to remove referrer data from its search results.
The permission can be taken away with two simple lines of code placed in a site's robots.txt file:
User-agent: Googlebot Disallow: /
Sure, every site owner in the world would need to publish this file to their sites. But if they did such a thing, the Google search engine could no longer crawl or index any of the Web's content. It would be defunct.
So, fellow site owners, Google's future is in our hands. If you want to go "on strike" and stop Google profiting from the fruits of your labours, simply publish the code. Be warned that your site will eventually be removed from Google's index if you do so. As a unilateral step, this may do you more harm than good. But if we all do it en masse, then beware Google!
That post was written four years ago. Now, with social media so prevalent that it can lead to regime change in countries, maybe it can lead to regime change among search engines too. Microsoft, are you listening?
Google has this week launched new advice on how to mark up a series of related pages in order to allow it to better understand the relationship between those pages. This could offer you the benefit of consolidating the pages into a single page for ranking calculations - which could be very helpful to say the least. Examples of pages that may gain from using this markup, which involves using rel=prev and rel=next tags in a page's head section, are
an article or forum thread spread across multiple pages, perhaps to derive greater advertising revenues or keep the text short and easy to consume
a product category consisting of so many products that they can't fit on one page. An example would be a top level category such as "Family Cars", before many filters had been applied to create smaller sub-categories that could easily fit on a page (e.g. "red 1.8L diesel automatic Volkswagen family cars near Kettering")
For more details on how to implement these tags see Google Webmaster Central: Pagination with rel=“next” and rel=“prev”. The article is well-written and gives very clear implementation advice. It includes a reference to a related Google post, Google Webmaster Central: View-all in search results, which describes how a rel=canonical tag can be used to specify a "View-all page", which is simply a single-page version of the content that may be presented elsewhere as a series of pages. Google makes the claim in this article that "searchers much prefer the View-all, single-page version of content".
But do searchers much prefer View-all pages? I'm sure they do if the View-all page is relatively short. Using a couple of Google's own examples of where rel=prev and rel=next may be useful, however:
a forum thread spread across multiple pages. I moderate forums and some threads can easily spread to 1000 or more responses. It's unlikely a member would want all of these on a single page for viewing
a product category consisting of many products. Again, a top level category could easily consist of over 1000 products.
It's interesting to note that a typical Google search yields millions of results and Google will display up to 1000 of them, by default across 100 pages at 10 results to a page. Google isn't implementing a View-all page there!
I think the example that Google really has in mind when they state that searchers "prefer the View-all version of content" is the article that might spread over three pages or so: reducing that to one page for indexing. This seems a fine idea.
But what to do about the long forum threads and product categories? Should we create View-all pages for those? I think not. Such pages could be too big and unwieldy, and could take too long too load, which (especially given that load time is now a ranking factor) could work against the SEO rather than for it.
Another option would be to create a View-all page containing less information, e.g. a cut down version of each post in the forum or each product in the category. This might be a good solution. Bear in mind, however, that Google is looking to rank this View-all page in preference to a paginated page, so
don't cut out content that contains long-tail keywords for ranking and
make sure if this page is going to rank well that it's a good landing page that can help the searcher achieve what you want them to achieve on your site
Another option is to deploy this strategy:
if your default posts or products "per-page" count is a small number (such as 10 products/page), consider changing it to a bigger number now (such as 50). This will reduce the number of pages in your page sequences dramatically. It will also increase the size of each page but technology has moved on - the 10 number became the standard when the Web was a lot slower than it is now and 50 seems a more appropriate number to me. It's a good number of products to compare in one go, for example.
once you have shorter series of larger pages, use the rel=prev and rel=next tags as described by Google.
If it's a product sequence, add a rel=canonical tag to each page in the series to make the URL of the first page in the series the canonical URL. It's OK to do this for a product sequence, as Google's rel=canonical documentation stated that "the sort order of a table of products" was an acceptable use of a rel=canonical tag. Since it's unlikely you would want to change the sort order of a set of article pages or forum posts, it wouldn't be as good to use a canonical tag on those series versus a product sequence.
For example, let's suppose you currently have a category of Family Cars that consists of 238 cars with 10 cars per page giving a series of 24 pages with the following URLs:
/cars/family/1
/cars/family/2
...
/cars/family/23
/cars/family/24
Here's what you could do:
Increase the default number of cars per page from 10 to 50. Now only 5 pages are needed to cover the series: /cars/family/1 ... /cars/family/5
Add a rel=next tag to /cars/family/1, a rel=prev tag to /cars/family/5, and both a rel=prev and a rel=next tag to the intervening three pages, as described by Google
Add rel=canonical tags to all five pages, citing /cars/family/1 as the canonical URL.
Google recently announced Google Instant, their new "search-before-you-type" service. Google Instant purports to predict what you are searching for based on the characters you've already entered into the search box, perform the search, and return the results to your browser before you have finished typing. Try Google Instant for yourself. Here's a Google-produced video that describes Google Instant in more detail: Google claims that Instant offers such advantages to searchers as Faster Searches, Smarter Predictions and Instant Results. Many of these features were already available with Google Autocomplete, so the real difference is real-time predictive search results pulled into the browser. And this looks like a game changer for PPC and SEO. Here's why:
Results are predicted and pulled in after only a few characters are entered by the searcher (in this case, three characters - "cre" - pulled in the results for "credit cards").
The suggestions push the natural results down the page. This is different to the old Autocomplete model where, for the most part, results are browsed whilst suggestions are not on the screen, allowing more screen real estate for results.
As a result of the above, there ten ads and only one natural result above the fold.
It gets worse. The matched term is the highly competitive head term, "credit cards". The searcher may have been en route to entering a niche tail term but now they have been distracted into searching a narrow range of head terms, which are expensive for Adwords advertisers and highly competitive for SEO - and only one natural result is visible above the fold anyway, versus ten paid results!
The Adwords auction thus becomes focused on smaller baskets of highly competitive keywords, rather than a long tail of cheaper keywords, and natural results take on a lesser prominence. The net effect of Google Instant could therefore be more head term searches, fewer tail term searches, more high-cost ad impressions and clicks, and fewer low cost ads and free natural results clicked. If this is Google's commercial intent then it looks like a great idea - for them. I'm not sure if searchers or most advertisers would agree, which is why I think that, over time, Google Instant will have to change.
Matt Cutts has given a very useful interview with Eric Enge, which rounds up a lot of information architecture and technical architecture issues. There's nothing really new here, but it's good to get all this info into one place and to see it confirmed by Matt. Topics covered:
crawl budget/indexation cap - the use of Pagerank and host load to control crawl depth and frequency
the effect of duplicate content on Pagerank
session IDs and affiliate IDs in links/URLs
faceted navigation - good to see Matt confirming that the advice I gave at SES London, and will be giving next week at SMX Munich, is all correct.
Different ideas for use of the rel=canonical tag
301 redirects and how they differ from 302 redirects
Google Webmaster Tools (WMT) ignore parameters
Pagerank Sculpting and its effectiveness in the modern world
Javascript, IFRAME and PDF handling
Paid links and nofollow
Overall, the article strongly reinforces the fact that a successful site architecture is essential to SEO success.
Google's John Mueller has published a good article on working with multi-regional web sites. He confirms that country-code Top Level Domains (ccTLDs) are the best way to host multi-regional content. He also clears up some of the myths surrounding duplicate content on multi-regional domains, which is most welcome.
John doesn't mention that the same thinking applies even if you are targeting a single country. A ccTLD is the best way to indicate the location of your target market to search engines - and to that market itself, of course.
A URL gives you at least five places to target a country: domain (ccTLD), subdomain (de.domain.com), directory(www.domain.com/de/), path parameters (www.domain.com/;domain=de) and query parameters(www.domain.com/?domain=de). However, there are lots more axes for the content to be split along:
Category - Web, Enterprise, Social, Real Time
Context - Intranet, Library, Personal
Topic - Health, Travel, Jobs, etc.
Vertical - Finance, Education, Government, etc.
Platform - Desktop, Mobile, Television, Kiosk
Format - Text, Image, Audio, Video, Map
(Note: the above is slightly modified from a table provided by Search Patterns, an excellent read)
Given this number of ways of organising content, and the fact that the location and language of your target audience are major considerations (worthy of a major axis), in all but the most trivial cases a ccTLD is the obvious choice for geo-targeting. It's good to see official written confirmation of this from Google.
My first impressions were that the ad is too negative. It doesn't show what Bing can do for you. It's at risk of associating Bing with information overload and distressed searchers. I'm also not convinced the phrase "decision engine" is a good one - too techie, too nebulous. Who's making the decisions - me, or Bing?
Compare it with Google's Superbowl ad:
This has its own potential problems - I'm not sure I would have been brave enough to use no voiceover whatsoever on a TV ad running in a £60,000 per second timeslot - but in general it's a much more upbeat ad showing someone achieving something - lots of things - using Google Search.
In Microsoft's position, I think I'd accept the fact that lots of people use Google and get good results lots of the time, and show that Bing is an alternative that often succeeds when Google fails. I'd challenge the notion that Google always delivers the right result, every time, and that if Google doesn't deliver it it can't be on the Web. I'd get people to try Bing - that's all you can ask of the ad. An idea would be to use something based on the famous "Pepsi Challenge", but bring it right up to date.
Having seen the interview with Ashley Highfield, I'm looking forward to more ads in the series. It would be great to see Bing achieve the double digit market share that he desires, but I think this was a bad start to the campaign.
It's SilverDisc's 17th Birthday today, so here's a free gift of an idea for Google, Yahoo and Microsoft to consider. Here at SilverDisc we're often having to install and test new conversion tracking code for our PPC clients. Usually this involves searching for one of our client's keywords on each search engine, clicking on it (thus incurring cost for the client) then going through the client's site, making a test purchase and, later, checking that all the analytics has worked. A cool feature that the search engines could add to improve efficiency would be a dummy campaign/ad-group/keyword that was automatically created by the engine itself within the PPC account specifically to test conversion tracking. The keyword could be assigned by the engine itself, and could be very long, cryptic and unique to each client account, e.g. g54fr89fdcdjasdoe84.
Searching for this keyword would always trigger the client's ad
Clicking this ad would not incur any real charges (although it may simulate a charge). Alternatively, a very low charge could be applied, e.g. £0.01.
Conversion tracking could work much faster for this one keyword, e.g. near-real-time, to allow better, faster testing
This would save loads of time within agencies and mean that client accounts were up and running sooner, making more money for both clients and search engines.
I'm still very troubled by this paid links issue after all these years!
I agree it's Google's right to penalise or promote any page/site in its natural listings, which represent Google's subjective opinion of relevancy.
However, the idea that all paid links are bad/"evil" is wrong in so many ways:
Paid links pre-date Google.
There is no machine-readable standard for labelling a paid link. I'll repeat that - there is no machine-readable standard for labelling a paid link.
Labelling paid links fails the "Does this makes sense in the absence of search engines?" ethical test. The answer may well be "Yes". (Where the answer is "No", I agree paid links are spam).
Labelling paid links fails the "Would I do this if search engines did not exist?" test. In fact, you have to know that Google exists, and that they mind about paid links, in order to label those paid links in the non-standard way that Google asks you to label them. This is perhaps my biggest beef with Google's approach to paid links - they actually violate one of Google's published Webmaster principles.
What does "paid" mean anyway? An actual exchange of cash? If you look at the top results for any hugely commercial field, say "car insurance", it's hard to believe that there is no commercial influence in the results! When all that a company does is commercial, then every link (positive or negative) to that company's site is commercial in nature.
I understand that a market in paid links arose because of Google's algorithm.
However, the irony is that in responding to that market by asking all publishers to label paid links in a non-standard way, Google violated its own principles. It started to ask publishers to adapt what they published to suit Google (because Google existed), and called them spammers if they didn't. That's the wrong way around. It's the spammers that do stuff purely because Google exists!
It appears that, some time ago, Google removed details of results prefetching from its Webmaster guidelines while continuing to implement results prefetching in its search results.
If you haven't a clue what I'm talking about, the Wayback Machine has the original Google Webmaster help on this topic, which I'll paste here verbatim in order to make it searchable (Wayback Machine pages aren't indexed by search engines):
Results Prefetching Questions
1. What is "results prefetching," and how does it impact my site?
On some searches, Google uses a special <link> tag supported by Firefox and Mozilla to instruct the browser to download the top search result before the user clicks on the result. When the user clicks on the top result, the destination page will load faster than before. This tag is only inserted when it is likely that the user will click on the first link.
For example, when a Firefox user searches for [stanford], Google includes the following tag in the results HTML:
Prefetching may impact your site because the prefetch request will happen whether or not the user clicks on the result, so it may result in additional traffic to your web server. Google only inserts this tag when there is a high likelihood that the user will click on the top result, but clearly this heuristic is not right 100% of the time.
2. Can I distinguish prefetch requests from normal requests?
3. I want to block/ignore prefetch requests. What should I do?
To block or ignore prefetch requests (from Google and other web sites), you should configure your web server to return a 404 HTTP response code for requests that contain the "X-moz: prefetch" header.
What else do you need to know about results prefetching?
If you run Google Analytics or another JavaScript-based analytics package, you won't see these prefetched pages in your analytics. That's because only the HTML is prefetched, not the images, JavaScript, etc. referenced by that HTML, which means that the Analytics JavaScript is never even fetched, let alone executed. You need to look at raw log files to see prefetched pages.
Google only issues the prefetch code when they are very confident that searchers will click on the #1 result (as in their example, a search for stanford). Most times, particularly for more "normal" sites (i.e. not Stanford), Google won't issue the code. So you may never see this on your own site.
However, it's worth being aware of this issue because if you do see a prefetch in your raw logs you'll want to know why; and because, depending on how you calculate conversions, the fact that a page is prefetched but never viewed by a searcher may significantly affect your conversion tracking and monetisation on that page. I'm surprised that Google removed this info from their Webmaster help.