Home > Software engineering >  htaccess - Prevent Hotlinking/Webpage Scrapping & Redirect Attacker's Webpage to Warning Page
htaccess - Prevent Hotlinking/Webpage Scrapping & Redirect Attacker's Webpage to Warning Page

Time:09-27

Malicious website owners are using the contents of our website to say example.com on their websites say spam.com like:

<?php
$url='https://example.com/';
// using file() function to get content
$lines_array=file($url);
// turn array into one variable
$lines_string=implode('',$lines_array);
//output, you can also save it locally on the server
echo $lines_string;
?>

We want to prevent the contents of our website from displaying on their websites and redirect those requests to a warning page on our website (to a webpage and not an image).

After doing some R&D, we tried doing this:

<IfModule mod_rewrite.c>
    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^$
    RewriteCond %{HTTP_REFERER} !^https://example\.com/.*$ [NC]
    RewriteRule ^(.*) https://example.com/404 [R=301,L]
</ifModule>

But it doesn't work. What are we doing wrong?

Reference: htaccess prevent hotlink also prevents external links

CodePudding user response:

"Hotlinking" and "webpage scraping" are two very different things. What you describe with the snippet of simplified PHP code is a form of "webpage scraping" or even "cloning". This does not (or is very unlikely to) generate a Referer header in the request, so cannot be blocked by simply checking the Referer (ie. HTTP_REFERER server variable) as you would do with "hotlinking".

(Your example mod_rewrite code blocks "hotlinking", not "scraping/cloning".)

The only way to block these types of requests is to block the IP address of the server making the request. For example, if the "malicious" requests are coming from 203.0.113.111 then you would do something like the following in the Apache 2.4 config (or .htaccess file) to block such requests:

<RequireAll>
    Require all granted
    Require not IP 203.0.113.111
</RequireAll>

However, the requests may not be coming from the same IP address that is hosting the "cloned" content. You'll need to determine this from your server's access logs. But to further complicate this the "attacker" may be using a series of IP addresses or have access to a botnet of ever-changing IPs. This can quickly become almost impossible to block without access to a more comprehensive firewall.

You can try other techniques such as issuing redirects to the canonical hostname from client-side code. However, more advanced "cloning" software (and/or reverse proxy servers) will "simply" modify the code/URLs to thwart your redirection attempts.

CodePudding user response:

So, I'm try to google it, and finded this:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example.com/.*$ [NC] 
RewriteRule ^(.*)$ http://www.example.com/404 [R=404,L] # R=404 returns 404 page
  • Related