I'm trying to extract data from a json response using scrapy
. The aim is to get the products listed in the respons:e
import scrapy
import json
class DepopSpider(scrapy.Spider):
name = 'depop'
allowed_domains = ["depop.com"]
start_urls = ['https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance']
def parse(self, response):
data = json.loads(response.body)
yield from data['meta']['products']
I get the following error:
ERROR: Spider error processing <GET https://webapi.depop.com/api/v2/search/products/?brands=1596&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance> (referer: None)
CodePudding user response:
Here is the minimal working code using scrapy and json
Script:
import scrapy
import json
class DepopSpider(scrapy.Spider):
name = 'depop'
def start_requests(self):
yield scrapy.Request (
url='https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance',
method='GET',
callback = self.parse,
)
def parse(self, response):
resp= response.json()['products']
#print(resp)
# json_data = json.dumps(resp)
# with open('data.json','w') as f:
# f.write(json_data)
for item in resp:
yield {
'Name': item['slug'],
'price':item['price']['priceAmount']
}
Output:
{'Name': 'kicksbrothers-exclusive-genuine-blue-inc', 'price': '22.98'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'isabellaimogen-crew-clothing-full-length-slim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-shirt-size', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-brand-1990s', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'tommkent-high-waisted-vintage-jeans-washed', 'price': '24.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'megsharp-super-cute-flowery-anchor-blue', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'moniulka2607-sweat-wear-for-man-shorts', 'price': '30.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'quynheu-free-uk-shipping-anchor-blue-07e1', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'bradymonster-oversized-stone-washed-shirt-from', 'price': '14.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'bonebear-vintage-funky-mens-large-shirt', 'price': '9.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'katy_potaty-vintage-anchor-blue-mom-jeanstrousers', 'price': '20.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'urielbongco-washed-up-denim-jacket-preloved', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'reubz16--thick-thermal-heavy-t-shirt', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'reubz16--vintage-egypt-tourist-tee', 'price': '16.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'kristoferjohnson-blue-harbour-mens-tailored-fit', 'price': '7.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'ravsonline-blue-willis-pure-indigo-cotton', 'price': '27.20'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'shikhalamode-anchor-blue-low-rise-denim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
.. so on
CodePudding user response:
if you want to handle the response of a json request maybe try this:
import requests
url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance"
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
so your output is stuff like:
{
"meta": {
"resultCount": 20,
"cursor": "MnwyMHwxNjQwMDA1ODc3",
"hasMore": false,
"totalCount": 20
},
"products": [
{
"id": 215371070,
"slug": "kicksbrothers-exclusive-genuine-blue-inc",
"status": "ONSALE",
"hasVideo": false,
"price": {
"priceAmount": "22.98",
"currencyName": "GBP",
"nationalShippingCost": "4.99",
"internationalShippingCost": "10.00"
},
"preview": {
"150": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P2.jpg",
"210": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P4.jpg",
"320": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P5.jpg",
"480": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P6.jpg",
"640": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P1.jpg",
"960": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P7.jpg",
"1280": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P8.jpg"
},
"variantSetId": 93,
"variants": {
"7": 1
},
"isLiked": false
},
How to parse json response
import requests
import json
def get_requests():
url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance"
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
return response.text
# x uses method "get_requests"
x = get_requests()
data_json = json.loads(x)
for id, price in zip(data_json['products'], data_json['products']):
print(id['id'])
print(price['price']['priceAmount'])
Output:
215371070
22.98
256715789
8.00
202721541
5.00
202722546
5.00
274328291
24.00
221641139
10.00
245419941
30.00
192541316
8.00
147762409
14.00
158406248
9.99
234693030
20.00
213377081
10.00
228630951
10.00
203627182
16.00
159958157
7.99
151413456
27.20
250985338
8.00
185488012
15.00
154423470
20.00
193888222
10.00
You looped through the json response and just saved the values of key: "id" and "price"