Home > other >  According to the official sample Python Scrapy creeper crawled out no data
According to the official sample Python Scrapy creeper crawled out no data

Time:09-20

I'm step by step according to the following operation,
The key question is:
The words in his article:
note contains [dmoz] line, that corresponds to our crawler, you can see every URL defined start_urls went unquestioned shiyuki, because the URL is the starting page, so they have no reference (referrers), so you will see at the end of each line (referer: & lt; None>) .
It is interesting to note that in our parse method, under the action of the two files are created: Books and Resources, these two file URL of the page content,

But I myself according to his practice
Run out,
There is no the [dmoz] line is this why?
 C: \ Python27 \ Scripts \ tutorial> Scrapy crawl dmoz 
C: \ Python27 \ Scripts \ tutorial \ tutorial \ spiders \ dmoz_spider py: 3: ScrapyDeprecatio
NWarning: tutorial. Spiders. Dmoz_spider. DmozSpider inherits from deprecated class
Scrapy. Spiders. BaseSpider, do inherit from scrapy. Spiders, spiders. (warning
Only on the first ttf_subclass, there may be others)
The class DmozSpider (BaseSpider) :
The 2016-07-01 16:37:56 [scrapy] INFO: scrapy 1.1.0 started (bot: tutorial)
The 2016-07-01 16:37:56 [scrapy] INFO: Overridden Settings: {' NEWSPIDER_MODULE ':' tu
Torial. Spiders', 'SPIDER_MODULES: [' tutorial. The spiders'],' ROBOTSTXT_OBEY ': True,
'BOT_NAME' : 'tutorial'}
The 2016-07-01 16:37:56 [scrapy] INFO: Enabled extensions:
[' scrapy. Extensions. Logstats. Logstats',
'scrapy. Extensions. Telnet. TelnetConsole',
'scrapy. Extensions. Corestats. Corestats']
The 2016-07-01 16:37:57 [scrapy] INFO: Enabled downloader middlewares:
[' scrapy. Downloadermiddlewares. Robotstxt. RobotsTxtMiddleware ',
'scrapy. Downloadermiddlewares. Httpauth HttpAuthMiddleware',
'scrapy. Downloadermiddlewares. Downloadtimeout DownloadTimeoutMiddleware',
'scrapy. Downloadermiddlewares. Useragent. UserAgentMiddleware',
'scrapy. Downloadermiddlewares. Retry. RetryMiddleware',
'scrapy. Downloadermiddlewares. Defaultheaders DefaultHeadersMiddleware',
'scrapy. Downloadermiddlewares. Redirect MetaRefreshMiddleware',
'scrapy. Downloadermiddlewares. Httpcompression HttpCompressionMiddleware',
'scrapy. Downloadermiddlewares. Redirect RedirectMiddleware',
'scrapy. Downloadermiddlewares. Cookies, CookiesMiddleware',
'scrapy. Downloadermiddlewares. Chunked ChunkedTransferMiddleware',
'scrapy. Downloadermiddlewares. Stats. DownloaderStats']
The 2016-07-01 16:37:57 [scrapy] INFO: Enabled spiders middlewares:
[' scrapy. Spidermiddlewares. Httperror. HttpErrorMiddleware ',
'scrapy. Spidermiddlewares. Can use OffsiteMiddleware',
'scrapy. Spidermiddlewares. Referer. RefererMiddleware',
'scrapy. Spidermiddlewares. Urllength UrlLengthMiddleware',
'scrapy. Spidermiddlewares. The depth. DepthMiddleware']
The 2016-07-01 16:37:57 [scrapy] INFO: Enabled item pipelines:
[]
The 2016-07-01 16:37:57 [scrapy] INFO: spiders the opened
The 2016-07-01 16:37:57 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), I scraped 0
Tems (at 0 items/min)
The 2016-07-01 16:37:57 [scrapy] the DEBUG: Telnet console listening on 127.0.0.1:6023
The 2016-07-01 16:37:58 [scrapy] the DEBUG: Crawled (200) & lt; GET http://www.dmoz.org/robot
S.t xt> (referer: None)
The 2016-07-01 16:37:59 [scrapy] the DEBUG: Crawled (200) & lt; GET http://www.dmoz.org/Compu
Ters/Programming/Languages/Python/Books/& gt; (referer: None)
The 2016-07-01 16:38:00 [scrapy] the DEBUG: Crawled (200) & lt; GET http://www.dmoz.org/Compu
Ters/Programming/Languages/Python/Resources/& gt; (referer: None)
The 2016-07-01 16:38:00 [scrapy] INFO: Closing spiders (finished)
The 2016-07-01 16:38:00 [scrapy] INFO: Dumping scrapy stats:
{' downloader/request_bytes: 734,
'downloader/request_count: 3,
'downloader/request_method_count/GET: 3,
'downloader/response_bytes: 16851,
'downloader/response_count: 3,
'downloader/response_status_count/200:3,
'finish_reason' : 'finished',
'finish_time: datetime. Datetime (2016, 7, 1, 8, 38, 0, 960000),
'log_count/DEBUG: 4,
'log_count/INFO: 7,
'response_received_count: 3,
'the scheduler/dequeued: 2,
'the scheduler/dequeued/memory: 2,
'the scheduler/enqueued: 2,
'the scheduler/enqueued/memory: 2,
"Start_time" : datetime. Datetime (2016, 7, 1, 8, 37, 57, 464000)}
The 2016-07-01 16:38:00 [scrapy] INFO: spiders closed (finished)





C: \ Python27 \ Scripts \ tutorial>

CodePudding user response:

reference 1st floor RUC_Godhao response:
can you tell me the problem solved?
settled, you also encounter this problem?
  • Related