Hi I am trying to scrape information from a ticket website about the prices of each ticket being listed. I am using BeautifulSoup4 to try to do so however I am not sure how to find information which is in a class inside another class. From the picture below you can see that I am trying to get to the 'AdvisoryPriceDisplay__content' class (Right at the very bottom of screenshot), but not entirely sure how to do so.
Is it because this is a dynamic website? https://www.stubhub.co.uk/nfl-london-tickets-nfl-london-london-tottenham-hotspur-stadium-9-10-2022/event/105289016/
Screenshot of Inspect on website
My code:
response = requests.get(url)
response_text = response.content
soup = bs(response_text, features='lxml')
results = soup.find(id='root')
results_1 = results.find('li', class_='RoyalTicketListPanel RoyalTicketListPanel__2')
print(results_1)
Thanks
CodePudding user response:
That page is pulling data from an API, which requires a complex..ish header to return any data. You can see this in Dev tools - Network tab. Here is one way to obtain that data:
import requests
import pandas as pd
headers = {
'authorization': 'Hawk id="1663679215.79344632b8d4eb23", ts="1663678316", nonce="syBUhf", mac="KnuDAZq2Mm12zRGjcdEaelWEDH6sq5mLSWcW1VvG7cI="',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'accept': 'application/json',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB',
'referer': 'https://www.stubhub.co.uk/nfl-london-tickets-nfl-london-london-tottenham-hotspur-stadium-9-10-2022/event/105289016/',
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
s = requests.Session()
s.headers.update(headers)
r = s.get('https://www.stubhub.co.uk/bfx/api/search/inventory/v2/listings?additionalPricingInfo=true&allSectionZoneStats=true&edgeControlEnabled=true&eventLevelStats=true&eventPricingSummary=true&listingAttributeCategorySummary=true&pricingSummary=true&quantitySummary=true§ionStats=true&shstore=1&start=0&urgencyMessaging=true&valuePercentage=false&zoneStats=true&scoreVersion=v2&eventId=105289016&quantity=&rows=20&sort=price asc, value desc&priceType=bundledPrice&listingAttributeCategoryList=&excludeListingAttributeCategoryList=&deliveryTypeList=§ionIdList=&zoneIdList=&pricemin=&pricemax=&listingRows=', headers=headers)
df = pd.json_normalize(r.json()['sectionStats'])
print(df)
Result in terminal:
sectionId sectionName minTicketPrice maxTicketPrice medianTicketPrice averageTicketPrice maxTicketQuantity totalTickets totalListings zoneId zoneName isGA percentiles minTicketPriceWithCurrency.amount minTicketPriceWithCurrency.currency medianTicketPriceWithCurrency.amount medianTicketPriceWithCurrency.currency averageTicketPriceWithCurrency.amount averageTicketPriceWithCurrency.currency maxTicketPriceWithCurrency.amount maxTicketPriceWithCurrency.currency
0 3146900 Level 1 - 107 960.000000 1320.000000 1298.400024 1192.800008 2 5 3 613596 Level 1 0 [{'name': 95.0, 'value': 1317.840002441406}] 960.0 GBP 1298.40 GBP 1192.80 GBP 1320.00 GBP
1 3146951 Level 5 - 522 312.000000 507.880005 396.000000 402.776001 1 5 5 613804 Level 5 0 [{'name': 95.0, 'value': 490.30400390625}] 312.0 GBP 396.00 GBP 402.78 GBP 507.88 GBP
2 3146838 Level 2 - 258 378.000000 1673.000000 582.000000 807.457145 6 15 7 613798 Level 2 0 [{'name': 95.0, 'value': 1498.6999999999996}] 378.0 GBP 582.00 GBP 807.46 GBP 1673.00 GBP
3 3146942 Level 1 - 115 396.000000 840.000000 558.000000 588.000000 2 5 4 613596 Level 1 0 [{'name': 95.0, 'value': 803.9999999999999}] 396.0 GBP 558.00 GBP 588.00 GBP 840.00 GBP
4 3146974 Level 3 Premium - 320 1378.800049 1378.800049 1378.800049 1378.800049 2 2 1 613801 Premium 3 0 [{'name': 95.0, 'value': 1378.800048828125}] 1378.8 GBP 1378.80 GBP 1378.80 GBP 1378.80 GBP
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
67 3146800 Level 3 Premium - 314 1798.800049 1798.800049 1798.800049 1798.800049 6 6 1 613801 Premium 3 0 [{'name': 95.0, 'value': 1798.800048828125}] 1798.8 GBP 1798.80 GBP 1798.80 GBP 1798.80 GBP
68 3146870 Level 4 - 451 300.000000 300.000000 300.000000 300.000000 1 1 1 613802 Level 4 0 [{'name': 95.0, 'value': 300.0}] 300.0 GBP 300.00 GBP 300.00 GBP 300.00 GBP
69 3146919 Level 4 - 416 480.000000 717.000000 598.500000 598.500000 2 3 2 613802 Level 4 0 [{'name': 95.0, 'value': 705.15}] 480.0 GBP 598.50 GBP 598.50 GBP 717.00 GBP
70 3146876 Level 2 - 256 1364.400024 1364.400024 1364.400024 1364.400024 5 5 1 613798 Level 2 0 [{'name': 95.0, 'value': 1364.4000244140625}] 1364.4 GBP 1364.40 GBP 1364.40 GBP 1364.40 GBP
71 3146964 Level 1 - 110 540.000000 1090.910034 776.750000 796.102509 3 7 4 613596 Level 1 0 [{'name': 95.0, 'value': 1052.7485290527343}] 540.0 GBP 776.75 GBP 796.10 GBP 1090.91 GBP
72 rows × 21 columns
Pandas docs: https://pandas.pydata.org/pandas-docs/stable/index.html
Also, requests docs: https://requests.readthedocs.io/en/latest/