I have these types of URL's :
www.example.com/view/a-dF-g3_dG
www.example.com/view/a-K5gD2?%f
www.example.com/view/a-b3R/%s_2
So basically they start with /view/a-
and continue with randum chars. I want to block google from crawling them.
However, there is one exception. I have an URL which looks like this:
www.example.com/view/a-home
This should be an exception, this URL should still be crawled. How can I do this ?
CodePudding user response:
This won't work for all bots, but the major search engines now support both Disallow
and Allow
directives:
User-Agent: *
Disallow: /a-
Allow: /a-home
The longest matching rule is the one that gets used. For /a-home
both rules match, but the rule that allows it it longer, so it is used. For /a-dFte
, only the disallow rule matches.
Bots that don't understand Allow:
directives would be unable to crawl your home page.
You can use Google's robots.txt tester tool to make sure that your robots.txt syntax is correct and that specific URLs are either disallowed or allowed as you expect.