Home > Software design >  Scrape multiple pages with json
Scrape multiple pages with json

Time:07-05

I am trying to scrape multiple pages with json but they will provide me error if there any solution kindly tell us search many solution but not find any solution that solve my problem

    import requests
    import json
    import pandas as pd
    headers = {
        'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,pt;q=0.7',
        'Connection': 'keep-alive',
        'Origin': 'https://www.nationalhardwareshow.com',
        'Referer': 'https://www.nationalhardwareshow.com/',
        'Sec-Fetch-Dest': 'empty',
        'Sec-Fetch-Mode': 'cors',
        'Sec-Fetch-Site': 'cross-site',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
        'accept': 'application/json',
        'content-type': 'application/x-www-form-urlencoded',
        'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Windows"',
    }
    
    params = {
        'x-algolia-agent': 'Algolia for vanilla JavaScript 3.27.1',
        'x-algolia-application-id': 'XD0U5M6Y4R',
        'x-algolia-api-key': 'd5cd7d4ec26134ff4a34d736a7f9ad47',
    }
    for i in range(0,4):
        data = '{"params":"query=&page={i}&facetFilters=&optionalFilters=[]"}'
    
        resp = requests.post('https://xd0u5m6y4r-dsn.algolia.net/1/indexes/event-edition-eve-e6b1ae25-5b9f-457b-83b3-335667332366_en-us/query', params=params, headers=headers, data=data).json()
    
        req_json=resp
        df = pd.DataFrame(req_json['hits'])
        f = pd.DataFrame(df[['name','representedBrands','description']])
        print(f)

CodePudding user response:

Try: concat the variable i with data parameter

import requests
import json
import pandas as pd
headers = {
    'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,pt;q=0.7',
    'Connection': 'keep-alive',
    'Origin': 'https://www.nationalhardwareshow.com',
    'Referer': 'https://www.nationalhardwareshow.com/',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'cross-site',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'accept': 'application/json',
    'content-type': 'application/x-www-form-urlencoded',
    'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"'
    }
    
params = {
    'x-algolia-agent': 'Algolia for vanilla JavaScript 3.27.1',
    'x-algolia-application-id': 'XD0U5M6Y4R',
    'x-algolia-api-key': 'd5cd7d4ec26134ff4a34d736a7f9ad47'
    }
lst=[]
for i in range(0,4):
    data = '{"params":"query=&page=' str(i) '&facetFilters=&optionalFilters=[]"}'
    
    resp = requests.post('https://xd0u5m6y4r-dsn.algolia.net/1/indexes/event-edition-eve-e6b1ae25-5b9f-457b-83b3-335667332366_en-us/query', params=params, headers=headers, data=data).json()
    
    req_json=resp
    df = pd.DataFrame(req_json['hits'])
    f = pd.DataFrame(df[['name','representedBrands','description']])
    lst.append(f)
    #print(f)
d=pd.concat(lst)
print(d)

CodePudding user response:

It is returning status_code 400 as the request is bad. You are sending wrongly formatted data. Change:

data = '{"params":"query=&page={i}&facetFilters=&optionalFilters=[]"}'

To

data = '{"params":"query=&page=' str(i) '&facetFilters=&optionalFilters=[]"}'

For it to work. Hope I could help.

  • Related