I'm trying to block yandex from my site. I've tried the solutions posted in other threads but they are not working so I'm wondering if I am doing something wrong?
The user-agent string is:
Mozilla/5.0 (compatible; YandexBot/3.0; http://yandex.com/bots
I have tried the following (one at a time). RewriteEngine is on
SetEnvIfNoCase User-Agent "^yandex.com$" bad_bot_block
Order Allow,Deny
Deny from env=bad_bot_block
Allow from ALL
SetEnvIfNoCase User-Agent "^yandex.com$" bad_bot_block
<RequireAll>
Require all granted
Require not env bad_bot_block
</RequireAll>
Can anyone see a reason one of the above won't work or have any other suggestions?
CodePudding user response:
SetEnvIfNoCase User-Agent "^yandex.com$" bad_bot_block
With the start and end-of-string anchors in the regex you are bascially checking that the User-Agent string is exactly equal to "yandex.com" (except that the .
is any character), which clearly does not match the stated user-agent string.
You need to check that the User-Agent
header contains "YandexBot" (or "yandex.com"). You can also use a case-sensitive match here, since the real Yandex bot does not vary the case.
For example, try the following instead:
SetEnvIf User-Agent "YandexBot" bad_bot_block
Consider using the BrowserMatch
directive instead, which is a shortcut for SetEnvIf User-Agent
.
If you are on Apache 2.4 then you should be using the Require
(second) variant of your two code blocks. Order
, Deny
and Allow
directives are Apache 2.2 and formerly deprecated on Apache 2.4.
However, consider using using robots.txt
instead to block crawling in the first place. Yandex supposedly supports robots.txt
.
CodePudding user response:
In case anyone else has this problem, the following worked for me:
RewriteCond %{HTTP_USER_AGENT} ^.*(yandex).*$ [NC]
RewriteRule .* - [F,L]