Home > Blockchain >  Get HTML class attribute by a the content with bs4 webscraping
Get HTML class attribute by a the content with bs4 webscraping

Time:11-11

So I'm currently trying to get a certain attribute just by the content of the HTML element. I know how to get an attribute by another attribute in the same HTML section. But this time I need the attribute by the content of the section.

"https://www.skatedeluxe.ch/de/adidas-skateboarding-busenitz-vulc-ii-schuh-white-collegiate-navy-bluebird_p155979?cPath=216&value[55][]=744" this is the link I try to scrape.

So I'm trying to get the "data-id" just by the " US 12"

This is how it looks

What I tried to do is getting it similar to how I'd get an attribute by an attribute. This is my code:

def carting ():
    a = session.get(producturl, headers=headers, proxies=proxy)
    soup = BeautifulSoup(a.text, "html.parser")
    product_id = soup.find("div", {"class" : "product-grid"})["data-product-id"]
    option_id = soup.find("option", {"option" : " US 12"})["data-id"]
    print(option_id)
carting()

This is what I get:

'NoneType' object is not subscriptable

I know that the code is wrong and doesn't work like I wrote it but I cannot figure how else I'm supposed to do it. Would appreciate help and ofc if you need more information just ask. Kind Regards

CodePudding user response:

Try:

import requests
from bs4 import BeautifulSoup


url = "https://www.skatedeluxe.ch/de/adidas-skateboarding-busenitz-vulc-ii-schuh-white-collegiate-navy-bluebird_p155979?cPath=216&value[55][]=744"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

sizes = soup.select_one("#product-size-chooser")
print(sizes.select_one('option:-soup-contains("US 12")')["data-id"])

Print:

16

CodePudding user response:

I suggest filtering the text using regex as you have whitespaces around it:

soup.find("option", text=re.compile("US 12"))["data-id"]

CodePudding user response:

there are a lot of ways to achieve this:

1st:

you can extract all the options and only pick the one you want with a loop

# find all the option tags that have the attribute "data-id"
for option in soup.find_all("option", attrs={'data-id':True}):
    if option.text.strip() == 'US 12':
        print(option.text.strip(), '/', option['data-id'])
        break

2nd:

you can use a regular expression (regex)

import re

# get the option that has "US 12" in the string
option = soup.find('option', string=re.compile('US 12'))

3rd:

using the CSS selectors

# get the option that has the attribute "data-id" and "US 12" in the string
option = soup.select_one('#product-size-chooser > option[data-id]:-soup-contains-own("US 12")')

I recommend you learn more about CSS selectors

  • Related