Home > other >  The crawler novice for help: could you tell me why climb to the content and URL in the address bar d
The crawler novice for help: could you tell me why climb to the content and URL in the address bar d

Time:09-16

So, I want to be in a certain chemical information website inquiries, simulated browser enter the CAS number of chemicals (may be regarded as a kind of ID) climbs the information, but the input of the CAS number it is correct, but will show "number can't match with any data you entered" (similar to lose the wrong number);
So I directly in urllib. Request () method directly try to open the normal input (in the browser the CAS number after the jump) page:

The from bs4 import BeautifulSoup
The import urllib. Request
The import SSL

Url="http://gestis-en.itrust.de/nxt/gateway.dll? Qeingabe=& amp; F=xhitlist& Xhitlist_x=Advanced& Xhitlist_s=field % 3 asortiername & amp; Xhitlist_q=% 5 bfield + 7732 z018z05 schnellsuche % 3 a * * % 5 d & amp; Xhitlist_d=& amp; Xhitlist_hc=& amp; Xhitlist_mh=2000 & amp; Xhitlist_vps=500 & amp; Xhitlist_xsl=xhitlist. Xsl& Xhitlist_vpc=first& Xhitlist_sel=title % 3 bpath % 3 brelevance - weight % 3 bcontent -type % 3 bhome - the title % 3 bitem - bookmark % 3 bfield % 3 astoffname % 3 bfield % 3 asortiername % 3 bfield % 3 azvgnr % 3 bfield % 3 acasnr % 3 bfield % 3 aegnr % 3 bfield % 3 aindexnr % 3 bfield % 3 aunnr % 3 b & amp; Searchform_list=% 23 noselection "

Headers={" Host ":" gestis - en. Itrust. DE, "
"Referer" : "http://gestis-en.itrust.de/nxt/gateway.dll? F=userinfo& Userinfo_xsl=banner. Xsl& Userinfo_cat=saved - search& Isclient=",
"The user-agent: Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36}
"
Answer=urllib. Request. The request (url, headers=headers)
Gcontext=SSL. SSLContext ()
HTML=urllib. Request. Urlopen (answer, context=gcontext). The read ()
Soup=BeautifulSoup (HTML, "HTML parser")
Print (soup)

Then can get the following results:


<meta content="text/HTML" HTTP - equiv="content-type"/& gt;
Search Results

Var xh.

The function initPage ()
{
Var query=NXT. Misc. The getInputValue (' js_params', 'query');
Var translatedQuery=NXT. Misc. The getInputValue (' js_params', 'translatedQuery');

Var select=NXT. Misc. The getInputValue (' js_params', 'select');
Var hitCount=parseInt (NXT. Misc. The getInputValue (' js_params', 'hitCount'));

Xh=new nxt.com p.x hitlist. XHitList ();
Xh. InitPage (translatedQuery query, select, hitCount, true, "112702960");
}

The function closeMessage (link)
{
Xh. CloseMessage (link);
}

</script>
  • Related