Home > Software design >  How to block yandex
How to block yandex

Time:07-29

I'm trying to block yandex from my site. I've tried the solutions posted in other threads but they are not working so I'm wondering if I am doing something wrong?

The user-agent string is:

    Mozilla/5.0 (compatible; YandexBot/3.0;  http://yandex.com/bots

I have tried the following (one at a time). RewriteEngine is on

    SetEnvIfNoCase User-Agent "^yandex.com$" bad_bot_block
    Order Allow,Deny
    Deny from env=bad_bot_block
    Allow from ALL

    SetEnvIfNoCase User-Agent "^yandex.com$" bad_bot_block
    <RequireAll>
    Require all granted
    Require not env bad_bot_block       
    </RequireAll>

Can anyone see a reason one of the above won't work or have any other suggestions?

CodePudding user response:

SetEnvIfNoCase User-Agent "^yandex.com$" bad_bot_block

With the start and end-of-string anchors in the regex you are bascially checking that the User-Agent string is exactly equal to "yandex.com" (except that the . is any character), which clearly does not match the stated user-agent string.

You need to check that the User-Agent header contains "YandexBot" (or "yandex.com"). You can also use a case-sensitive match here, since the real Yandex bot does not vary the case.

For example, try the following instead:

SetEnvIf User-Agent "YandexBot" bad_bot_block

Consider using the BrowserMatch directive instead, which is a shortcut for SetEnvIf User-Agent.

If you are on Apache 2.4 then you should be using the Require (second) variant of your two code blocks. Order, Deny and Allow directives are Apache 2.2 and formerly deprecated on Apache 2.4.

However, consider using using robots.txt instead to block crawling in the first place. Yandex supposedly supports robots.txt.

CodePudding user response:

In case anyone else has this problem, the following worked for me:

    RewriteCond %{HTTP_USER_AGENT} ^.*(yandex).*$ [NC]
    RewriteRule .* - [F,L]
  • Related