trying to scrape products details like brand and flavour using beautifulsoup-CodePudding

Can anyone pls help me to scrape Flavour and brand details as key value pair using beautifulsoup. I am new in this:

Desired output would be

Flavour - Green Apple

Brand - Carabau

the html looks like this: Html Code -

<tr class="a-spacing-small">
<td class="a-span3">
    <span class="a-size-base a-text-bold">Flavour</span>
</td>

<td class="a-span9">
    <span class="a-size-base">Green Apple</span>
</td>

<tr class="a-spacing-small">
<td class="a-span3">
    <span class="a-size-base a-text-bold">Brand</span>
</td>

<td class="a-span9">
    <span class="a-size-base">Carabau</span>
</td>

CodePudding user response：

from bs4 import BeautifulSoup

html = '''
    <tr class="a-spacing-small">
    <td class="a-span3">
        <span class="a-size-base a-text-bold">Flavour</span>
    </td>
    
    <td class="a-span9">
        <span class="a-size-base">Green Apple</span>
    </td>
    <tr class="a-spacing-small">
    <td class="a-span3">
        <span class="a-size-base a-text-bold">Brand</span>
    </td>
    
    <td class="a-span9">
        <span class="a-size-base">Carabau</span>
    </td>
    '''

soup = BeautifulSoup(html,'html.parser')
first_element = soup.find_all('td', {'class': 'a-span3'})
second_element = soup.find_all('td', {'class': 'a-span9'})

for first_attribute,second_attribute in zip(first_element,second_element):
    print("{} - {}".format(first_attribute.text.strip(),second_attribute.text.strip()))

Can be done using BeautifulSoup, this will get you the desired output, if you are reading HTML from a URL, you would need to apply some changes by replacing the HTML with fetched content raw content.

CodePudding user response：

You can do like this.

Select the table rows <tr> using .find_all(). This will give you a list of <tr> tags.
For every <tr>, get it's text and print them the way you need.

Here is the complete code:

from bs4 import BeautifulSoup

s = """
<tr >
<td >
    <span >Flavour</span>
</td>

<td >
    <span >Green Apple</span>
</td>
<tr >
<td >
    <span >Brand</span>
</td>

<td >
    <span >Carabau</span>
</td>
"""
soup = BeautifulSoup(s, 'lxml')
for tr in soup.find_all('tr'):
    print(' - '.join(list(tr.stripped_strings)))

Output:

Flavour - Green Apple
Brand - Carabau