Home > Software engineering >  Why blocking user-agents in htaccess
Why blocking user-agents in htaccess

Time:12-30

I just realized that a user-agent (HetrixTools) was requesting my apache serveur every 10 seconds... This generates a lot of traffic for nothing.

I was advised to block it in the htacces, which I did.

But I have 2 questions about this:

  • Is it really safe since the user-agent can be modified (in a curl for example).
  • Does it really help the server to limit the load? In the sense that the request is still made, only the following treatments like the connection to the database, etc. are not done.

Thanks.

CodePudding user response:

Is it really safe since the user-agent can be modified (in a curl for example).

It is "safe", but whether it is reliable is another matter. Blocking a bot based on the User-Agent is obviously dependent on that bot sending a reliable User-Agent header.

The "HetrixTools Uptime Monitoring Bot" appears to be a "good" bot and they do publish the User-Agent they use. So I would have no reason to think this would not be reliable.

They also publish all the IP addresses they use to crawl from, so this is another way to block the bot (as they suggest) - but there are quite a lot of IP addresses and this isn't necessarily a static list.

Note that if you are the website owner then they do suggest that you contact them in order to "opt out" - this might be the most reliable method to prevent all requests.

They don't mention using robots.txt - ordinarily, this would be the preferred approach to block "good" bots.

Does it really help the server to limit the load? In the sense that the request is still made, only the following treatments like the connection to the database, etc. are not done.

The request is still reaching your server, but it is being blocked before it consumes as many server resources, so yes it does "limit the load". I'm assuming you are serving a minimal text-only response in this case and not serving the "blocked response" through your server-side application/CMS!

However, depending on the type of uptime monitor this bot is using, server resource usage could already be minimal. You say that it "generates a lot of traffic for nothing" - but is this "traffic" actually consuming a lot of resources? What type of traffic is this?

To prevent the request from even reaching your server you would need to implement a hardware firewall that sits in front of your application server.

  • Related