Home > front end >  The crawler ajax caught
The crawler ajax caught

Time:03-22

Want to ask, questions about dynamic rendering js ajax

My sugar in crawl pile (https://www.duitang.com/topics/#! Hot - p3) and job sites (https://search.51job.com/list/000000, 000000000 0,00,9,99, UI, 2, 1. HTML) problems found in the

Two sites are js dynamic rendering of closed (js), the same crawl code, heap sugar can print out page rendered content source, but the job site unable to print out a complete web page, this is the reason why (ajax asynchronous synchronous?)

How to crawl job site full page (selenium haven't learned)


The code is as follows:
Url='https://www.duitang.com/topics/#! Hot - p3 '

The head={
"The user-agent: Mozilla/5.0 (Windows NT 10.0; Win64. X64)
""AppleWebKit/537.36 (KHTML, like Gecko)"
"Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.50"
}

Request=urllib. Request. The request (url, headers=head)
HTML=""
Try:
The response=urllib. Request. Urlopen (request)
HTML=response. The read (). The decode (' utf-8)
Print (HTML)
Except urllib. Error. URLError as e:
If hasattr (e, "code") :
Print (ode) of e.c. with our fabrication:
If hasattr (e, "" reason") :
Print (" e.r eason)

CodePudding user response:

This is getting job site code, almost no change
Url=https://search.51job.com/list/000000, 000000000 0,00,9,99, UI, 2, 1. HTML '

The head={
"The user-agent: Mozilla/5.0 (Windows NT 10.0; Win64. X64)
""AppleWebKit/537.36 (KHTML, like Gecko)"
"Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.50"
}

Request=urllib. Request. The request (url, headers=head)
HTML=""
Try:
The response=urllib. Request. Urlopen (request)
HTML=response. The read (). The decode (' GBK ')
Print (HTML)
Except urllib. Error. URLError as e:
If hasattr (e, "code") :
Print (ode) of e.c. with our fabrication:
If hasattr (e, "" reason") :
Print (" e.r eason)

CodePudding user response:

Under the roof ZSBD
  •  Tags:  
  • Ajax
  • Related