Malicious website owners are using the contents of our website to say example.com
on their websites say spam.com
like:
<?php
$url='https://example.com/';
// using file() function to get content
$lines_array=file($url);
// turn array into one variable
$lines_string=implode('',$lines_array);
//output, you can also save it locally on the server
echo $lines_string;
?>
We want to prevent the contents of our website from displaying on their websites and redirect those requests to a warning page on our website (to a webpage and not an image).
After doing some R&D, we tried doing this:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^https://example\.com/.*$ [NC]
RewriteRule ^(.*) https://example.com/404 [R=301,L]
</ifModule>
But it doesn't work. What are we doing wrong?
Reference: htaccess prevent hotlink also prevents external links
CodePudding user response:
"Hotlinking" and "webpage scraping" are two very different things. What you describe with the snippet of simplified PHP code is a form of "webpage scraping" or even "cloning". This does not (or is very unlikely to) generate a Referer
header in the request, so cannot be blocked by simply checking the Referer
(ie. HTTP_REFERER
server variable) as you would do with "hotlinking".
(Your example mod_rewrite code blocks "hotlinking", not "scraping/cloning".)
The only way to block these types of requests is to block the IP address of the server making the request. For example, if the "malicious" requests are coming from 203.0.113.111
then you would do something like the following in the Apache 2.4 config (or .htaccess
file) to block such requests:
<RequireAll>
Require all granted
Require not IP 203.0.113.111
</RequireAll>
However, the requests may not be coming from the same IP address that is hosting the "cloned" content. You'll need to determine this from your server's access logs. But to further complicate this the "attacker" may be using a series of IP addresses or have access to a botnet of ever-changing IPs. This can quickly become almost impossible to block without access to a more comprehensive firewall.
You can try other techniques such as issuing redirects to the canonical hostname from client-side code. However, more advanced "cloning" software (and/or reverse proxy servers) will "simply" modify the code/URLs to thwart your redirection attempts.
CodePudding user response:
So, I'm try to google it, and finded this:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example.com/.*$ [NC]
RewriteRule ^(.*)$ http://www.example.com/404 [R=404,L] # R=404 returns 404 page