Home > Software design >  I am using BeautifulSoup4 to extract data from a website and I am trying to find extract data from a
I am using BeautifulSoup4 to extract data from a website and I am trying to find extract data from a

Time:01-24

I am using find or find all to first find the class and then use data-tab-name as a variable to find the specific tab I am looking for. I am not sure how to do the latter.

Example

<div  data-tab-name="X1">
x = soup.find_all(class_='product-detail-tab')

how could I then search by sub-category 'data-tab-name' to find the data under the X1 tab, and then the X2 tab and so on.

Any help would be highly appreciated!

CodePudding user response:

If I understand your question correctly, you want to find all div elements with class "product-detail-tab" and attribute "data-tab-name="..."" then I think you can use the following code:

from bs4 import BeautifulSoup

soup = BeautifulSoup(
    '<div  data-tab-name="X1">'
    '   <span>Boo</span>'
    '</div>'
    '<div  data-tab-name="X2">'
    '   <h6>Foo</h6>'
    '</div>',
    features="html.parser"
)

names = ['X1', 'X2']
for name in names:
    tab = soup.find(
        'div', {
            "class": "product-detail-tab",
            "data-tab-name": f"{name}"
        }
    )
    print(tab)

<div  data-tab-name="X1"> <span>Boo</span></div>
<div  data-tab-name="X2"> <h6>Foo</h6></div>

You will have to form the names of the tabs for the search yourself, I designed them in the form of a list.

Then in these tabs you can search for other data, something like this:

tab1 = soup.find(
    'div', {
        "class": "product-detail-tab",
        "data-tab-name": "X1"
    }
)
span = tab1.find("span", recursive=False)
print(span)

tab2 = soup.find(
    'div', {
        "class": "product-detail-tab",
        "data-tab-name": "X2"
    }
)
h6 = tab2.find("h6", recursive=False)
print(h6)
  • Related