Home > other >  Scrapy climb take the boss's name and the job description
Scrapy climb take the boss's name and the job description

Time:11-15


How I in the video with a creeper copy with scrapy crawl boss position of recruitment website and job description, this site seems to be updated, so the teacher out part of the code of the video, there are some xpath path, I had the change, but don't know why, climbed out of nothing, also do not complain,

Beg passing by the great god, look, I have checked a day, also don't know what went wrong, the following is the source code, I wrap up the whole project is uploaded to the baidu network backup,
Position is # 16 rows, crawl
# 10 lines, is the job description
crawlJob description details page # 18, is
https://www.zhipin.com/job_detail/? Query=python& City=101010100 & amp; Industry=& amp; The position=% 27

I guess the XPATH write wrong? Other, I'm the lighting in the video the teacher's code,
 
The import scrapy


The class BossSpider (scrapy. Spiders) :
Name='boss'
# allowed_domains=[' www.xxx.com ']
Start_urls=[' https://www.zhipin.com/job_detail/?query=python&city=101010100&industry=&position=']

Def parse_detail (self, response) :
Job_desc=response. Xpath ('//* [@ id="main"]/div [3]/div/div [2]/div [2]/div [1]/div//text () '). The extract ()
Job_desc="'. Join (job_desc)
Print (job_desc)
Def parse (self, response) :
Li_list=response. Xpath ('//* [@ id="main"]/div/div [2]/ul/li ')
For li in li_list:
Job_name=li. Xpath ('.//div/@/span [1]/a/text () '). The extract_first ()
Print (job_name)
Detail_url='https://www.zhipin.com' + li. Xpath ('.//div/@/span [1]/a/@ href '). The extract_first ()
Yield scrapy. Request (detail_url, callback=self. Parse_detail)




Link: https://pan.baidu.com/s/1iWJkQZgqerlFoR9rONObgQ
The extracted code: 1234

  • Related