Home > Blockchain >  How do I grab the string on the next line in HTML code following <span> tag with specific clas
How do I grab the string on the next line in HTML code following <span> tag with specific clas

Time:02-20

I'm trying to scrape out some product specifications from some e-commerce website. So I have a list of URLs to various products, I need my code to go to each (this part is easy) and scrape out the product specs I need. I have been trying to use ParseHub — it works for some links but it does not for other. My suspicion is, for example, 'Wheel diameter' changes its location every time so it ends up grabbing wrong spec value.

One of such parts, for example, in HTML looks like this:

<div >
          <span >Wheel Diameter</span>
          <span data-product-custom-field="">8 Inches</span>
        </div>

What I think I could do is if I use BeautifulSoup and if I could somehow using smth like

if soup.find("span", class_ = "product-detail-key").text.strip()=="Wheel Diameter":
                *go to the next line and grab the string inside*

How can I code this? I really apologize if my question sounds silly, pardon my ignorance, I'm pretty new to webscraping.

CodePudding user response:

You can use .find_next() function:

from bs4 import BeautifulSoup

html_doc = """
<div >
  <span >Wheel Diameter</span>
  <span data-product-custom-field="">8 Inches</span>
</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

diameter = soup.find("span", text="Wheel Diameter").find_next("span").text
print(diameter)

Prints:

8 Inches

Or using CSS selector with :

diameter = soup.select_one('.product-detail-key:-soup-contains("Wheel Diameter")   *').text

CodePudding user response:

Using css selectors you can simply chain / combinate your selection to be more strict. In this case you select the <span> contains your string and use adjacent sibling combinator to get the next sibling <span>.

diameter = soup.select_one('.product-detail-key:-soup-contains("Wheel Diameter")   span').text

or

diameter = soup.select_one('span.product-detail-key:-soup-contains("Wheel Diameter")   span').text

Note: To avoid AttributeError: 'NoneType' object has no attribute 'text', if element is not available you can check if it exists before calling text method:

diameter = e.text if (e := soup.select_one('.product-detail-key:-soup-contains("Wheel Diameter")   span')) else None

Example

from bs4 import BeautifulSoup

html_doc = """
<div >
  <span >Wheel Diameter</span>
  <span data-product-custom-field="">8 Inches</span>
</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

diameter = e.text if (e := soup.select_one('.product-detail-key:-soup-contains("Wheel Diameter")   span')) else None
  • Related