Home > OS >  Scrapy extracting data from json response
Scrapy extracting data from json response

Time:12-21

I'm trying to extract data from a json response using scrapy. The aim is to get the products listed in the respons:e

import scrapy
import json

class DepopSpider(scrapy.Spider):
    name = 'depop'
    allowed_domains = ["depop.com"]
    start_urls = ['https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance']
def parse(self, response):
    data = json.loads(response.body)
    yield from data['meta']['products']

I get the following error:

ERROR: Spider error processing <GET https://webapi.depop.com/api/v2/search/products/?brands=1596&itemsPerPage=24&country=gb&currency=GBP&sort=relevance> (referer: None)

CodePudding user response:

Here is the minimal working code using scrapy and json

Script:

import scrapy
import json

class DepopSpider(scrapy.Spider):
    name = 'depop'

    def start_requests(self):
        yield scrapy.Request (
            url='https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance',
            method='GET',
            callback = self.parse,
           
            )
    def parse(self, response):
        resp= response.json()['products']
        #print(resp)
        # json_data = json.dumps(resp)

        # with open('data.json','w') as f:
        #     f.write(json_data)

        for item in resp:
            yield {
                'Name': item['slug'],
                'price':item['price']['priceAmount']
                }

Output:

{'Name': 'kicksbrothers-exclusive-genuine-blue-inc', 'price': '22.98'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'isabellaimogen-crew-clothing-full-length-slim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-shirt-size', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-brand-1990s', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'tommkent-high-waisted-vintage-jeans-washed', 'price': '24.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'megsharp-super-cute-flowery-anchor-blue', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'moniulka2607-sweat-wear-for-man-shorts', 'price': '30.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'quynheu-free-uk-shipping-anchor-blue-07e1', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'bradymonster-oversized-stone-washed-shirt-from', 'price': '14.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'bonebear-vintage-funky-mens-large-shirt', 'price': '9.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'katy_potaty-vintage-anchor-blue-mom-jeanstrousers', 'price': '20.00'}       
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'urielbongco-washed-up-denim-jacket-preloved', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'reubz16--thick-thermal-heavy-t-shirt', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'reubz16--vintage-egypt-tourist-tee', 'price': '16.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'kristoferjohnson-blue-harbour-mens-tailored-fit', 'price': '7.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'ravsonline-blue-willis-pure-indigo-cotton', 'price': '27.20'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'shikhalamode-anchor-blue-low-rise-denim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>

.. so on

CodePudding user response:

if you want to handle the response of a json request maybe try this:

import requests

url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance"

payload={}
headers = {}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

so your output is stuff like:

{
    "meta": {
        "resultCount": 20,
        "cursor": "MnwyMHwxNjQwMDA1ODc3",
        "hasMore": false,
        "totalCount": 20
    },
    "products": [
        {
            "id": 215371070,
            "slug": "kicksbrothers-exclusive-genuine-blue-inc",
            "status": "ONSALE",
            "hasVideo": false,
            "price": {
                "priceAmount": "22.98",
                "currencyName": "GBP",
                "nationalShippingCost": "4.99",
                "internationalShippingCost": "10.00"
            },
            "preview": {
                "150": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P2.jpg",
                "210": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P4.jpg",
                "320": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P5.jpg",
                "480": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P6.jpg",
                "640": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P1.jpg",
                "960": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P7.jpg",
                "1280": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P8.jpg"
            },
            "variantSetId": 93,
            "variants": {
                "7": 1
            },
            "isLiked": false
        },

How to parse json response

import requests
import json

def get_requests():
    url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance"
    payload={}
    headers = {}
    response = requests.request("GET", url, headers=headers, data=payload)
    return response.text

# x uses method "get_requests"
x = get_requests()

data_json = json.loads(x)
for id, price in zip(data_json['products'], data_json['products']):
    print(id['id'])
    print(price['price']['priceAmount'])

Output:

215371070
22.98
256715789
8.00
202721541
5.00
202722546
5.00
274328291
24.00
221641139
10.00
245419941
30.00
192541316
8.00
147762409
14.00
158406248
9.99
234693030
20.00
213377081
10.00
228630951
10.00
203627182
16.00
159958157
7.99
151413456
27.20
250985338
8.00
185488012
15.00
154423470
20.00
193888222
10.00

You looped through the json response and just saved the values of key: "id" and "price"

  • Related