Python web crawler, web page can normal open, but the error status code-CodePudding

python web pages status code, execute the following code, web pages can be opened
Def getStatusCode (url) :
Urllib3. Disable_warnings ()
R=requests. Get (url, verify=False)
Return r.s tatus_code

Driver. The get (" https://xxxxxxxxxx/")
Sleep (3)
Print (getStatusCode (driver. Current_url))

return to 403 status code, online to find the information added to the header, as follows:
Def getStatusCode (url) :
Urllib3. Disable_warnings ()
The header={' the user-agent ':' Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36 '}
R=requests. Get (url, verify=False, headers=headers)
Return r.s tatus_code

Driver. The get (" https://xxxxxxx/")
Sleep (3)
Print (getStatusCode (driver. Current_url))
status code of 200, consult everybody, this is why, I am here to get a lot of web page status code, always can't one by one for the header

CodePudding user response:

The header is certainly want to take, take a universal header

CodePudding user response:

403 who, which is an active refused, you have been discovered, headers are just a crawler, don't need a a copy, in addition to have a special verification, get requests are usually won't have a lot of restrictions, only need the user-agent, there are too many methods to build a headers

Can give you a look at the background of a no headers crawler and the difference between the normal browser, just to see your identity, your guess I will not give you a

This is the crawler

This is a normal browser

CodePudding user response:

If you climb baidu also have to Accept is added in the header

CodePudding user response:

Disguised as a regular user because you don't have the creeper crawled, it is best to give you want to crawl web pages headers to write complete, so as not to appear problem, because each page said need something different,