So I'm currently trying to get a certain attribute just by the content of the HTML element. I know how to get an attribute by another attribute in the same HTML section. But this time I need the attribute by the content of the section.
"https://www.skatedeluxe.ch/de/adidas-skateboarding-busenitz-vulc-ii-schuh-white-collegiate-navy-bluebird_p155979?cPath=216&value[55][]=744" this is the link I try to scrape.
So I'm trying to get the "data-id" just by the " US 12"
What I tried to do is getting it similar to how I'd get an attribute by an attribute. This is my code:
def carting ():
a = session.get(producturl, headers=headers, proxies=proxy)
soup = BeautifulSoup(a.text, "html.parser")
product_id = soup.find("div", {"class" : "product-grid"})["data-product-id"]
option_id = soup.find("option", {"option" : " US 12"})["data-id"]
print(option_id)
carting()
This is what I get:
'NoneType' object is not subscriptable
I know that the code is wrong and doesn't work like I wrote it but I cannot figure how else I'm supposed to do it. Would appreciate help and ofc if you need more information just ask. Kind Regards
CodePudding user response:
Try:
import requests
from bs4 import BeautifulSoup
url = "https://www.skatedeluxe.ch/de/adidas-skateboarding-busenitz-vulc-ii-schuh-white-collegiate-navy-bluebird_p155979?cPath=216&value[55][]=744"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
sizes = soup.select_one("#product-size-chooser")
print(sizes.select_one('option:-soup-contains("US 12")')["data-id"])
Print:
16
CodePudding user response:
I suggest filtering the text using regex as you have whitespaces around it:
soup.find("option", text=re.compile("US 12"))["data-id"]
CodePudding user response:
there are a lot of ways to achieve this:
1st:
you can extract all the options and only pick the one you want with a loop
# find all the option tags that have the attribute "data-id"
for option in soup.find_all("option", attrs={'data-id':True}):
if option.text.strip() == 'US 12':
print(option.text.strip(), '/', option['data-id'])
break
2nd:
you can use a regular expression (regex)
import re
# get the option that has "US 12" in the string
option = soup.find('option', string=re.compile('US 12'))
3rd:
using the CSS selectors
# get the option that has the attribute "data-id" and "US 12" in the string
option = soup.select_one('#product-size-chooser > option[data-id]:-soup-contains-own("US 12")')
I recommend you learn more about CSS selectors