My sugar in crawl pile (https://www.duitang.com/topics/#! Hot - p3) and job sites (https://search.51job.com/list/000000, 000000000 0,00,9,99, UI, 2, 1. HTML) problems found in the
Two sites are js dynamic rendering of closed (js), the same crawl code, heap sugar can print out page rendered content source, but the job site unable to print out a complete web page, this is the reason why (ajax asynchronous synchronous?)
How to crawl job site full page (selenium haven't learned)
The code is as follows:
Url='https://www.duitang.com/topics/#! Hot - p3 '
The head={
"The user-agent: Mozilla/5.0 (Windows NT 10.0; Win64. X64)
""AppleWebKit/537.36 (KHTML, like Gecko)"
"Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.50"
}
Request=urllib. Request. The request (url, headers=head)
HTML=""
Try:
The response=urllib. Request. Urlopen (request)
HTML=response. The read (). The decode (' utf-8)
Print (HTML)
Except urllib. Error. URLError as e:
If hasattr (e, "code") :
Print (ode) of e.c. with our fabrication:
If hasattr (e, "" reason") :
Print (" e.r eason)
CodePudding user response:
This is getting job site code, almost no changeUrl=https://search.51job.com/list/000000, 000000000 0,00,9,99, UI, 2, 1. HTML '
The head={
"The user-agent: Mozilla/5.0 (Windows NT 10.0; Win64. X64)
""AppleWebKit/537.36 (KHTML, like Gecko)"
"Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.50"
}
Request=urllib. Request. The request (url, headers=head)
HTML=""
Try:
The response=urllib. Request. Urlopen (request)
HTML=response. The read (). The decode (' GBK ')
Print (HTML)
Except urllib. Error. URLError as e:
If hasattr (e, "code") :
Print (ode) of e.c. with our fabrication:
If hasattr (e, "" reason") :
Print (" e.r eason)
CodePudding user response:
Under the roof ZSBD