I am trying to use scrappy to grab some data off of a public website. Thankfully the data mostly can be found in an xhr request here:
But when I double click to see the actual response there is no data in the search_results item:
I am just wondering what is going on with the request, how can I access this data in scrapy, currently im trying to like this but obviously its not grabbing any of the data from the response.
import scrapy
from scrapy import Spider
class Whizzky(Spider):
name = "whizzky"
def __init__(self,):
self.request_url = "https://www.whizzky.net/webapi/get_finder_results.php?cid=31&flavours=&view=rated&price=3&country=®ions="
def start_requests(self):
urls = ["https://www.whizzky.net/finder_results.php"]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
yield scrapy.Request(self.request_url,
method='POST',
callback=self.parse_2)
def parse_2(self, response):
info = {}
info["data"] = response.json()["search_results"]
yield info
CodePudding user response:
Actually, response is just working fine and your coding structure is also fine. You are getting json data from API
as POST
method.So in order to pull data correctly, It's mandatory to inject content-type headers
and payload data as body parameter in request method.
An example with full working solution:
import scrapy
class TestSpider(scrapy.Spider):
name = 'test'
body = 'maxResults=30&pager=30'
def start_requests(self):
api_url ='https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions='
yield scrapy.Request(
url = api_url,
callback=self.parse,
body=self.body,
method="POST",
headers= {
"content-type":"application/x-www-form-urlencoded",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
})
def parse(self, response):
for card in response.json()['search_results']:
yield {'Title':card['product_title']}
Output:
{'Title': 'Midleton Very Rare 2002'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Michel Couvreur Special Vatting Peaty '}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Laphroaig 25 Year Old'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'The Macallan Rare Cask Batch No.1 2018 Release'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'William Larue Weller 2017 Release 128.2 Proof'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Glen Moray Rare Vintage 1987 25 Year Old Port Cask Finish Batch 2'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'High West A Midwinter Nights Dram Limited Engagement Act 7 Scene 4'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Laphroaig Extremely Rare 30 Year Old'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Ardbeg Supernova 2019 Committee Release'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Caol Ila Cask Strength Distillery Exclusive 2017 Release'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'The Macallan 1824 Collection Estate Reserve Travel Retail Exclusive'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Ardbeg Supernova 2014 Committee Release'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'The Macallan Edition No.1'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': "Midleton Dair Ghaelach Grinsell's Wood"}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'George T. Stagg Bourbon 2019 Release 116.9 Proof'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Glenmorangie 25 Year Old'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'The Loch Fyne Craigellachie 10 Year Old'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'The Macallan 1997 18 Year Old Sherry Oak Cask Matured'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Lot No.40 Cask Strength Rye 1st Edition 12 Year Old'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Breaker Bourbon Port Barrel Finish Special Edition'}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': "Parker's Heritage Collection Single Barrel Bourbon 11 Year Old"}
2022-09-12 00:25:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.whizzky.net/webapi/get_finder_results.php?cid=&flavours=&view=rated&price=3&country=®ions=>
{'Title': 'Tomintoul 27 Year Old'}