First about the structure of our web site, I was A comic website, website access server is A front desk, picture address is made A CDN server B and accelerate,
And reverse proxy site of the web site, we also is the same at the front desk server + images respectively the reverse proxy server and cache our files, which means that once our 1. HTML page by generations later, this page is curing on their server cannot be modified,
On the Internet a lot of shielding the reverse proxy methods:
1) using js solution plan, write js jump on his page code, determine the domain name is not our website will jump to stand, in all kinds of writing, add variables, plus note, add the code in different places every day, be found after the jump a few times, however, the other party directly on the web page blocked the core code "document. The domain", as long as the code, appeared on the page will be replaced with blank, this plan, killed,
2) after tried a variety of methods, the website is the most direct way to seal the IP, on their own server only each other to create a new generation will go to climb the page, and then open the log to check the page IP access, sure enough, the other is to use baidu spider simulation method, the thief to our content, because the other mirror sites, we automatically banned IP, effective for 2 days, when the other party can't find the generation, also increased the generation of rules, every day he an IP, he would change a IP to the generation, then later, they give up on automatic image our website and changed to manual mirror, etc. After I update every day, they will open the generation to grab the content and then turn off the generation, still can do every day and I'm content synchronization, and also not our cache file, so that we can't crawl to the IP of the other party, this plan, killed,
Standoff 3) found that a few days, they also is not our website manual mirror every day, but set a time-lapse images, will be around 3 PM and 10 PM every day, to mirror a automatically, if mirrored our content, their website will not be able to access short error, this time they will manual operation, so witty and I think of a way to, since don't automatically cache any file, so the core of the CSS and js files are always want to climb? So every day I change the CSS and js file name, its own website to make their sites, CSS and js effect failure caused the page disorder, such as I stood in the CSS file named CSS. The CSS, so every day I generate the cover letter will be changed to css201705291. CSS (with the date to indicate, I will daily update file name to each other), and then a day keep the refresh the other website, once you see each other on the day of the automatic image style disorder after I stand, I will immediately change the CSS file and into css201705292 CSS, then open the iis log monitoring will visit css201705291 IP at this time, keep them manually update time to climb css201705291. CSS this file, and then their IP, it lasted a week, this step is also in trouble himself, however, by the way, also sick they wouldn't let them comfortable generation of us, but the content of the every day or can do and I synchronization, change the site CSS, js file name, killed,
4) in their sick a few days later, however, found that their reverse proxy and the new rules, they opened the image automatically, I use the old way before in a given directory to generate HTML to catch up the IP of the page, but I found that for each IP banned, will take a new IP, but that page that only the generation will go to climb, will be their house spider, respectively, with two IP climb twice, the first time the IP is fixed, if the IP fetching failure, the second IP will automatically be randomized into other IP, the online article said seal each other a few IP each other down, but don't know where the other party make come of so many foreign IP, spent the whole morning after the ban of dozens of IP, even couldn't finish sealing IP section, (picture)
Finally, intrigue against each other for a month of nausea came to an end, I said helpless to give up, looked at each other in their baidu statistical traffic pv high after his web site, to the other party bosses, lower the head to the posts written by a little mess, I don't know the great god of CSDN ever understand me, and have to deal with the infinite IP to automatically reverse proxy site, complaints, let alone to find baidu baidu useful wouldn't be myself,
CodePudding user response:
He mirror for you, as a result of his PV is higher than you?Climb the website can't completely avoid, can only try to grasp the characteristics, record the request header, look have what laws, such as fixed useragent, such as refererrurl, etc., according to the characteristics of the silence,
CodePudding user response: