March 19, 2009 Office Hours

March 19, 2009

Have you noticed the disclaimer at the beginning of the show? Opinion of the host and doesn’t reflect the views of Webmaster Radio. Really? Vanessa is offended.

Something new in the chatrooms– migrating chatrooms tweetchat.com webmasterradio room log in with your twitter account.

This episode talks a little bit about robots files and controlling the SEbots access to your site. It’s more complicated than appears at first glance.

Kerry SMX West biggest search geek competition has a question: How is the no index file used and is Google using it?
Standard in robots file is disallow which tells SEbots not to crawl that set of pages. Not being able to crawl doesn’t mean not being able to index it. Site owners don’t always understand this.
Say a SE crawls a page that they are allowed to crawl, on that page they extract some links… SE thinks everything on that page is allowed to be indexed. So if one of those links is linked to another page that is disallowed, but they already have the URL, SEbots still might index the URL without crawling the content. This used to be known as a partially indexed URL. Over time SEbots started to take anchor text and use that, or go to DMoz and use that, etc. So it would look like an indexed page. Site owners didn’t like this so they started using a meta no index tag, but it tended to be a lot of overhead for sites, and not scalable, so what Google started to obey is a no index directive in the robots file which operates the same as the meta only not as much overhead. Vanessa is not sure if Microsoft and Yahoo! support this, but Google does.

If you use the meta no index tag on the page itself, don’t also put the disallow on the page, because SEbots will see the robots text first and not even see the meta at all.

We’ve been talking about new canonical tag that available link element you put in the title of your page, if you previously used disallow robots pattern matching (like sort=best or something), you need to go an unblock pattern from the bots file or SEbots will never associate links to the parameter version again and thus you will lose page rank value.

What is Vanessa’s favorite color?
Vanessa doesn’t have one even though you probably think it’s blue.

Kerry’s follow up question: Can you use wild cards in the no index function? Yes, Vanessa thinks so.

Pingbacks: the good the bad and the ugly…. In terms of links for page rank, they would be handled in the same way as links from a comment in a blog… Vanessa doesn’t know how much they are counted, probably not much value. Google blog supports pingbacks. Should you always disapprove the scraping? Yes, it’s spam. Some are valid but Vanessa doesn’t approve those either.

Does the webmaster tools API include functionality for things such as crawl reports and errors. Yes, you can use it to submit sitemap, verify your site, etc, but Google has not yet implemented the ability to pull reports such as crawl error reports… let’s all ask Google webmasters team to add that in!

Listen to the Podcast at WebmasterRadio.FM

Leave a Reply

  • Office Hours

    Half hour. Every week. Any question you have.

    Thursdays at 1pm Pacific on Webmaster Radio. Listen live or check back here each week for the podcast.

    You can also subscribe to the podcast on iTunes.

    Office Hours With Vanessa Fox on Webmaster Radio