Home > Blockchain >  How to scrape subclasses in Beautiful Soup
How to scrape subclasses in Beautiful Soup

Time:09-21

Hi I am trying to scrape information from a ticket website about the prices of each ticket being listed. I am using BeautifulSoup4 to try to do so however I am not sure how to find information which is in a class inside another class. From the picture below you can see that I am trying to get to the 'AdvisoryPriceDisplay__content' class (Right at the very bottom of screenshot), but not entirely sure how to do so.

Is it because this is a dynamic website? https://www.stubhub.co.uk/nfl-london-tickets-nfl-london-london-tottenham-hotspur-stadium-9-10-2022/event/105289016/

Screenshot of Inspect on website

My code:

response = requests.get(url)
response_text = response.content
soup = bs(response_text, features='lxml')

results = soup.find(id='root')
results_1 = results.find('li', class_='RoyalTicketListPanel RoyalTicketListPanel__2')
print(results_1)

Thanks

CodePudding user response:

That page is pulling data from an API, which requires a complex..ish header to return any data. You can see this in Dev tools - Network tab. Here is one way to obtain that data:

import requests
import pandas as pd



headers = {
    'authorization': 'Hawk id="1663679215.79344632b8d4eb23", ts="1663678316", nonce="syBUhf", mac="KnuDAZq2Mm12zRGjcdEaelWEDH6sq5mLSWcW1VvG7cI="',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'accept': 'application/json',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-GB',
    'referer': 'https://www.stubhub.co.uk/nfl-london-tickets-nfl-london-london-tottenham-hotspur-stadium-9-10-2022/event/105289016/',
    'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
    }

s = requests.Session()
s.headers.update(headers)
r = s.get('https://www.stubhub.co.uk/bfx/api/search/inventory/v2/listings?additionalPricingInfo=true&allSectionZoneStats=true&edgeControlEnabled=true&eventLevelStats=true&eventPricingSummary=true&listingAttributeCategorySummary=true&pricingSummary=true&quantitySummary=true&sectionStats=true&shstore=1&start=0&urgencyMessaging=true&valuePercentage=false&zoneStats=true&scoreVersion=v2&eventId=105289016&quantity=&rows=20&sort=price asc, value desc&priceType=bundledPrice&listingAttributeCategoryList=&excludeListingAttributeCategoryList=&deliveryTypeList=&sectionIdList=&zoneIdList=&pricemin=&pricemax=&listingRows=', headers=headers)
df = pd.json_normalize(r.json()['sectionStats'])
print(df)

Result in terminal:

sectionId   sectionName minTicketPrice  maxTicketPrice  medianTicketPrice   averageTicketPrice  maxTicketQuantity   totalTickets    totalListings   zoneId  zoneName    isGA    percentiles minTicketPriceWithCurrency.amount   minTicketPriceWithCurrency.currency medianTicketPriceWithCurrency.amount    medianTicketPriceWithCurrency.currency  averageTicketPriceWithCurrency.amount   averageTicketPriceWithCurrency.currency maxTicketPriceWithCurrency.amount   maxTicketPriceWithCurrency.currency
0   3146900 Level 1 - 107   960.000000  1320.000000 1298.400024 1192.800008 2   5   3   613596  Level 1 0   [{'name': 95.0, 'value': 1317.840002441406}]    960.0   GBP 1298.40 GBP 1192.80 GBP 1320.00 GBP
1   3146951 Level 5 - 522   312.000000  507.880005  396.000000  402.776001  1   5   5   613804  Level 5 0   [{'name': 95.0, 'value': 490.30400390625}]  312.0   GBP 396.00  GBP 402.78  GBP 507.88  GBP
2   3146838 Level 2 - 258   378.000000  1673.000000 582.000000  807.457145  6   15  7   613798  Level 2 0   [{'name': 95.0, 'value': 1498.6999999999996}]   378.0   GBP 582.00  GBP 807.46  GBP 1673.00 GBP
3   3146942 Level 1 - 115   396.000000  840.000000  558.000000  588.000000  2   5   4   613596  Level 1 0   [{'name': 95.0, 'value': 803.9999999999999}]    396.0   GBP 558.00  GBP 588.00  GBP 840.00  GBP
4   3146974 Level 3 Premium - 320   1378.800049 1378.800049 1378.800049 1378.800049 2   2   1   613801  Premium 3   0   [{'name': 95.0, 'value': 1378.800048828125}]    1378.8  GBP 1378.80 GBP 1378.80 GBP 1378.80 GBP
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
67  3146800 Level 3 Premium - 314   1798.800049 1798.800049 1798.800049 1798.800049 6   6   1   613801  Premium 3   0   [{'name': 95.0, 'value': 1798.800048828125}]    1798.8  GBP 1798.80 GBP 1798.80 GBP 1798.80 GBP
68  3146870 Level 4 - 451   300.000000  300.000000  300.000000  300.000000  1   1   1   613802  Level 4 0   [{'name': 95.0, 'value': 300.0}]    300.0   GBP 300.00  GBP 300.00  GBP 300.00  GBP
69  3146919 Level 4 - 416   480.000000  717.000000  598.500000  598.500000  2   3   2   613802  Level 4 0   [{'name': 95.0, 'value': 705.15}]   480.0   GBP 598.50  GBP 598.50  GBP 717.00  GBP
70  3146876 Level 2 - 256   1364.400024 1364.400024 1364.400024 1364.400024 5   5   1   613798  Level 2 0   [{'name': 95.0, 'value': 1364.4000244140625}]   1364.4  GBP 1364.40 GBP 1364.40 GBP 1364.40 GBP
71  3146964 Level 1 - 110   540.000000  1090.910034 776.750000  796.102509  3   7   4   613596  Level 1 0   [{'name': 95.0, 'value': 1052.7485290527343}]   540.0   GBP 776.75  GBP 796.10  GBP 1090.91 GBP
72 rows × 21 columns

Pandas docs: https://pandas.pydata.org/pandas-docs/stable/index.html

Also, requests docs: https://requests.readthedocs.io/en/latest/

  • Related