I have a lot of 404 hits to my site to PDF pages that have never existed on the site. These are all spammy-subject.pdf URLs. I get tens of these per day, which is much higher than genuine site traffic.
I'm currently adding 410 rewrites for each.
Can I use htaccess rule to totally block this traffic from reaching this site? Before it becomes a 404?
CodePudding user response:
Can I use htaccess rule to totally block this traffic from reaching this site?
You can use .htaccess
to prevent the request from being routed through a CMS such as WordPress, Joomla, etc. that uses a front-controller pattern - if that's what you mean by "site". However, the request has already reached your server by the time the .htaccess
file is processed, so doing anything in .htaccess
isn't necessarily going to help a "static site".
If you are already returning a 404 (or 410) - before it reaches your site - then the issue is already resolved.
The only potential issue is if the requests are being routed through your CMS and the 404 is being triggered by your CMS, not Apache. This would suggest you have the directives in the wrong place in your .htaccess
file (or not present at all)? Blocking directives like this need to be at the top of your .htaccess
file, before any existing rewrites.
For example:
# Prevent 404 request being routed unnecessarily through CMS
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule \.pdf$ - [NC,R=404]
There's no advantage to serving a 410 Gone instead of a 404 unless these files previously existed and you are trying to remove them from search engines (or telling 3rd parties they no longer exist).
UPDATE:
Should this code be at the very top or after the opening Wordpress rule: RewriteEngine On ?
It needs to be at the very top, before the # BEGIN WordPress
comment marker (you should avoid manually editing the code in the WordPress section since WordPress itself maintains this section and your edits will be overwritten).
Yes, this is before the RewriteEngine On
directive. You do not need to repeat the RewriteEngine
directive. The location of the RewriteEngine
directive does not actually matter. If there are multiple instances of this directive in the file then the last instance wins and controls the entire file. (It is a quick way to effectively comment out all the mod_rewrite directives in the file by simply placing a RewriteEngine Off
directive at the very end.)