I'm trying to fetch product title
and it's description
from a
Requests done from python vs a browser are the same thing. If the headers, URLs, and parameters are identical, they should receive identical responses. So the next step is comparing the difference between your request and the request done by the browser:
So one or more of the headers included by the browser gets a good response from the server, but just using User-Agent
is not enough.
I would try to identify which headers, but unfortunately, Nordstrom detected some 'unusual activity' and seems to have blocked my IP :( Probably due to sending an obvious handmade request. I think it's my IP that's blocked since I can't access the site from any browser, even after clearing my cache.
So double-check that the same hasn't happened to you while working with your scraper.
Best of luck!
CodePudding user response:
The page is dynamic. go after the data from the api source:
import requests
import pandas as pd
api = 'https://www.nordstrom.com/api/ng-looks/styleId/6638030?customerId=f36cf526cfe94a72bfb710e5e155f9ba&limit=7'
jsonData = requests.get(api).json()
df = pd.json_normalize(jsonData['products'].values())
print(df.iloc[0])
Output:
id 6638030-400
name ANINE BING Women's Plaid Shirt
styleId 6638030
styleNumber
colorCode 400
colorName BLUE
brandLabelName ANINE BING
hasFlatShot True
imageUrl https://n.nordstrommedia.com/id/sr3/6d000f40-8...
price $149.00
pathAlias anine-bing-womens-plaid-shirt/6638030?origin=c...
originalPrice $149.00
productTypeLvl1 12
productTypeLvl2 216
isUmap False
Name: 0, dtype: object