Home > Back-end >  Can't scrape information from a static webpage using requests module
Can't scrape information from a static webpage using requests module

Time:06-14

I'm trying to fetch product title and it's description from a simple response body

simple response headers

Requests done from python vs a browser are the same thing. If the headers, URLs, and parameters are identical, they should receive identical responses. So the next step is comparing the difference between your request and the request done by the browser: browser request

So one or more of the headers included by the browser gets a good response from the server, but just using User-Agent is not enough.

I would try to identify which headers, but unfortunately, Nordstrom detected some 'unusual activity' and seems to have blocked my IP :( blocked Probably due to sending an obvious handmade request. I think it's my IP that's blocked since I can't access the site from any browser, even after clearing my cache.

So double-check that the same hasn't happened to you while working with your scraper.

Best of luck!

CodePudding user response:

The page is dynamic. go after the data from the api source:

import requests
import pandas as pd

api = 'https://www.nordstrom.com/api/ng-looks/styleId/6638030?customerId=f36cf526cfe94a72bfb710e5e155f9ba&limit=7'
jsonData = requests.get(api).json()

df = pd.json_normalize(jsonData['products'].values())

print(df.iloc[0])

Output:

id                                                       6638030-400
name                                  ANINE BING Women's Plaid Shirt
styleId                                                      6638030
styleNumber                                                         
colorCode                                                        400
colorName                                                       BLUE
brandLabelName                                            ANINE BING
hasFlatShot                                                     True
imageUrl           https://n.nordstrommedia.com/id/sr3/6d000f40-8...
price                                                        $149.00
pathAlias          anine-bing-womens-plaid-shirt/6638030?origin=c...
originalPrice                                                $149.00
productTypeLvl1                                                   12
productTypeLvl2                                                  216
isUmap                                                         False
Name: 0, dtype: object
  • Related