I am building a spider to crawl different tabs in a page.
There are cases where I need to extract an URL to go to the next page:
url = i.css('a').attrib['href']
yield response.follow(url=url, callback=self.parse_menu)
And there are cases where I dont need to go to a different page, but still want to go to the next step in the pipeline (parse_menu
), so I do something like this:
yield response.follow(url=response.url,callback=self.parse_menu)
The first scenario works well, but in the second scenario parse_menu
never gets called.
I think there is something I am missing in how the request and callback work maybe.
Thanks in advance!
CodePudding user response:
Try:
url = i.css('a::attr(href)').get()
CodePudding user response:
I am not sure if I understand you well, but I think you are sending the same request twice, so you need to set dont_filter to True.
yield response.follow(url=response.url,callback=self.parse_menu,dont_filter=True)