Home > other >  Some problems about crawl 58 position information
Some problems about crawl 58 position information

Time:09-30

Novice small white, try to climb the 58 city post information about the computer class, looking for a source, and finally the href climbed off company introduction of web site, I want to change to post information website, not success
Crawl company website code:
The import requests
The from bs4 import BeautifulSoup
Url="https://dt.58.com/tech/pn {}
"
Def spiders () :
For I in range (5) :
The req=requests. Get (url. The format (STR + 1) (I))
The req. Encoding="utf-8"
Soup=BeautifulSoup (the req. Text, ". The HTML parser ")
The items=soup. Select (" li job_item ")
For the item in the items:
Address=item. Select (div. Item_con span. "address") [0]. Text# select () returns the type of list
Name=item. Select (div. Item_con span. "name") [0]. The text
Href=https://bbs.csdn.net/topics/item.select (" div. Item_con div.com p_name a.f l ") [0]. Get (" href ")
Print (" % s \ \ t t % s % s "% (address, name, href))

If __name__=="__main__ ':
Spiders ()
Crawl job site code:
The import requests
The from bs4 import BeautifulSoup
Url="https://dt.58.com/tech/pn {}
"
Def spiders () :
For I in range (5) :
The req=requests. Get (url. The format (STR + 1) (I))
The req. Encoding="utf-8"
Soup=BeautifulSoup (the req. Text, ". The HTML parser ")
The items=soup. Select (" li job_item ")
For the item in the items:
Address=item. Select (div. Item_con span. "address") [0]. The text
Name=item. Select (div. Item_con span. "name") [0]. The text
Href=https://bbs.csdn.net/topics/item.select (" div. Item_con div. Job_name clearfix a ") [0]. Get (" href ")
Print (" % s \ \ t t % s % s "% (address, name, href))

If __name__=="__main__ ':
Spiders ()

This is crawl web site source code


By the way can through to crawl to the post information web site and then to the second page of crawl, climb post position?
Thank you bosses (first post ask, what is wrong, please advise)

CodePudding user response:

Crawl post information for the site code mismatch IndexError: list index out of range
But the two are not web sites? Because the post information url is too long?

CodePudding user response:

Secondary pages crawl, add a for loop can be in post information (below)
  • Related