Crawl company website code:
The import requests
The from bs4 import BeautifulSoup
Url="https://dt.58.com/tech/pn {}
"
Def spiders () :
For I in range (5) :
The req=requests. Get (url. The format (STR + 1) (I))
The req. Encoding="utf-8"
Soup=BeautifulSoup (the req. Text, ". The HTML parser ")
The items=soup. Select (" li job_item ")
For the item in the items:
Address=item. Select (div. Item_con span. "address") [0]. Text# select () returns the type of list
Name=item. Select (div. Item_con span. "name") [0]. The text
Href=https://bbs.csdn.net/topics/item.select (" div. Item_con div.com p_name a.f l ") [0]. Get (" href ")
Print (" % s \ \ t t % s % s "% (address, name, href))
If __name__=="__main__ ':
Spiders ()
Crawl job site code:
The import requests
The from bs4 import BeautifulSoup
Url="https://dt.58.com/tech/pn {}
"
Def spiders () :
For I in range (5) :
The req=requests. Get (url. The format (STR + 1) (I))
The req. Encoding="utf-8"
Soup=BeautifulSoup (the req. Text, ". The HTML parser ")
The items=soup. Select (" li job_item ")
For the item in the items:
Address=item. Select (div. Item_con span. "address") [0]. The text
Name=item. Select (div. Item_con span. "name") [0]. The text
Href=https://bbs.csdn.net/topics/item.select (" div. Item_con div. Job_name clearfix a ") [0]. Get (" href ")
Print (" % s \ \ t t % s % s "% (address, name, href))
If __name__=="__main__ ':
Spiders ()
This is crawl web site source code
By the way can through to crawl to the post information web site and then to the second page of crawl, climb post position?
Thank you bosses (first post ask, what is wrong, please advise)
CodePudding user response:
Crawl post information for the site code mismatch IndexError: list index out of rangeBut the two are not web sites? Because the post information url is too long?
CodePudding user response:
Secondary pages crawl, add a for loop can be in post information (below)