29th February 2008
When should "NOINDEX" mean "INDEX"?
Matt Cutts has stirred up a little hornets' nest with his "What should NOINDEX do?" post. Matt reckons the topic will be colossally boring to some people - but not to me. For some reason I find Robots standards fascinating. Yep, I know I'm weird.
The crux of Matt's issue is ...
The question is whether Google should completely drop a NOINDEX’ed page from our search results vs. show a reference to the page, or something in between?
The obvious response is to completely drop the NOINDEX'ed page. NOINDEX is made up of the two words NO and INDEX; so it means do not index, right?
Maybe not. It's important to be precise here. What exactly does NOINDEX mean?
Often when talking about indexing issues, it's useful to separate in your mind the indexing of a URL from the indexing of the content at that URL. This concept is particularly important in the contexts of URL canonicalization, duplicate content and ... robots standards. I'll restrict this discussion to the NOINDEX part of the robots standards, but an equally interesting discussion exists around robots.txt too.
Once we separate URL and content, the question "What exactly does NOINDEX mean?" can be answered in several ways:
1) Index the URL but not the content
2) Don't index the URL or the content
3) (Somehow, not sure how!) index the content but not the URL
One thing is for sure ... it does not mean index both the content and the URL. :D
In my opinion NOINDEX should definitely mean "Don't index the content". Definitely. No question.
The question of whether it should mean "Don't index the URL" is an interesting one. There are arguments both ways. In my experience, however, there are many, many different examples of when it should mean "Don't index the URL". In these instances, if the URL was indexed, it would result in something bad happening either for searchers, or the site owner, or both. Therefore, generally, I think it should mean "Don't index the URL".
However, there is one specific case where I think it would be acceptable to index the URL, and which would give benefit to both searchers and site owners (very often). That specific case is when the URL is the home page of the site.
Taking the three "problem" URLs cited by Matt in his post:
If high-profile sites like
aren’t showing up in Google because of the NOINDEX meta tag, that’s bad for users
These three URLs are all actually home pages. The second and third URLs are obviously so. The first URL is the result of a couple of 302 redirects:
- http://www.police.go.kr/ is a 302 to http://www.police.go.kr/index.jsp
- http://www.police.go.kr/index.jsp is a 302 to http://www.police.go.kr/main/index.do
This makes http://www.police.go.kr/main/index.do the home page of the site. The way Google works (correctly IMO) is that a redirect from "/" to a deeper page on a site would normally result in the content of that deeper URL being indexed under "/".
So, I think a reasonable middle ground, that satisfies the best interests of searchers, site owners and search engine, would be the following:
- Do not index the content.
- Do not link to the URL in the search results, unless the URL is a “home page” (/, or redirected to by /).
- If it is a home page with a NOINDEX tag, it’s OK to link to it in the SERPs, but do not index the content; do not provide a snippet; and do not provide a cached copy. Treat it like a “partially indexed page”.