Home > Blockchain >  blocking crawlers on specific directory
blocking crawlers on specific directory

Time:02-19

I have a situation similar to a previous question that uses the following in the accepted answer:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]

It just seems the rules provided from URL above block access to everything (including homepage level)

  • www.example.com/tbd_templates/

  • www.example.com/custom_post/

what I really need is to block access to the directories I specified (/tbd_templates/ ,/custom_post/ etc with status code 403) but allow access to the rest of the site structure.

My .htaccess is:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

anyone can help me?

CodePudding user response:

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]

As mentioned in the linked answer, this code would need to go in the .htaccess file inside the directory you are trying to protect - so that it only applies to everything in that directory (denoted by the .* regex).

However, that is impractical if you need to protect several directories. In this case you should change the RewriteRule pattern to target the specific subdirectories you want to protect (also touched on in the linked answer).

For example, the following would need go before the WordPress code block (ie, before the # BEGIN WordPress comment marker). (You do not need to repeat the RewriteEngine directive, which already occurs later in the file.)

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule ^(tbd_templates|custom_post)($|/) - [F]

The first argument to the RewriteRule directive (the pattern) is a regular expression that matches against the requested URL-path, less the slash prefix.

The regex ^(tbd_templates|custom_post)($|/) matches requests for /tbd_templates or /custom_post (using regex alternation) or /tbd_templates/<anything> or /custom_post/<anything>.

The F flag is short for R=403. The L flag is not required here, it is implied when using F (or R=403).

  • Related