Home > database >  Robots.txt how to crawl specific type of urls while excluding others similar
Robots.txt how to crawl specific type of urls while excluding others similar

Time:09-03

I have these types of URL's :

www.example.com/view/a-dF-g3_dG
www.example.com/view/a-K5gD2?%f
www.example.com/view/a-b3R/%s_2

So basically they start with /view/a- and continue with randum chars. I want to block google from crawling them.

However, there is one exception. I have an URL which looks like this:

www.example.com/view/a-home

This should be an exception, this URL should still be crawled. How can I do this ?

CodePudding user response:

This won't work for all bots, but the major search engines now support both Disallow and Allow directives:

User-Agent: *
Disallow: /a-
Allow: /a-home

The longest matching rule is the one that gets used. For /a-home both rules match, but the rule that allows it it longer, so it is used. For /a-dFte, only the disallow rule matches.

Bots that don't understand Allow: directives would be unable to crawl your home page.

You can use Google's robots.txt tester tool to make sure that your robots.txt syntax is correct and that specific URLs are either disallowed or allowed as you expect.

  • Related