Can anyone pls help me to scrape Flavour and brand details as key value pair using beautifulsoup. I am new in this:
Desired output would be
Flavour - Green Apple
Brand - Carabau
the html looks like this: Html Code -
<tr class="a-spacing-small">
<td class="a-span3">
<span class="a-size-base a-text-bold">Flavour</span>
</td>
<td class="a-span9">
<span class="a-size-base">Green Apple</span>
</td>
<tr class="a-spacing-small">
<td class="a-span3">
<span class="a-size-base a-text-bold">Brand</span>
</td>
<td class="a-span9">
<span class="a-size-base">Carabau</span>
</td>
CodePudding user response:
I have take data as html
and you can use find
method on respective tag to get exact data also you can use find_next()
alternatively
html="""<tr class="a-spacing-small">
<td class="a-span3">
<span class="a-size-base a-text-bold">Flavour</span>
</td>
<td class="a-span9">
<span class="a-size-base">Green Apple</span>
</td>
</tr>"""
Code:
from bs4 import BeautifulSoup
soup=BeautifulSoup(html,"html.parser")
dict={}
data=soup.find("td",class_="a-span3").find_next().text
data1=soup.find("td",class_="a-span9").find("span",class_="a-size-base").text
print(data " - " data1)
dict[data]=data1
Output:
Flavour - Green Apple
CodePudding user response:
You can do like this.
Select the <tr>
and use .stripped_strings
to get a list of strings inside <tr>
.
Note: If you have multiple <tr>
then use .find_all()
to select each of it and do the same.
from bs4 import BeautifulSoup
s = """
<tr >
<td >
<span >Flavour</span>
</td>
<td >
<span >Green Apple</span>
</td>
</tr>
"""
soup = BeautifulSoup(s, 'lxml')
tr = soup.find('tr')
print(list(tr.stripped_strings))
['Flavour', 'Green Apple']
CodePudding user response:
There's actually no need in .stripped_strings
as mentioned by Ram since you can directly call a specific CSS
selector which will be safer since it will grab data from specific elements, not from something else, and this doesn't create a dictionary key-value pair as you wanted.
You're looking for this:
# ...
data = []
for result in soup.select('tr'):
# CSS selector for flavour detail
flavor_name = result.select_one('.a-span9 .a-size-base').text
# appends to list() as a dict() -> key-value pair
data.append({
"flavour": flavor_name
})
print(data)
# # [{'flavour': 'Green Apple'}]
Code and example in the online IDE (will return key-value pair):
from bs4 import BeautifulSoup
html = '''
<tr >
<td >
<span >Flavour</span>
</td>
<td >
<span >Green Apple</span>
</td>
'''
soup = BeautifulSoup(html, 'html.parser')
# temp list()
data = []
for result in soup.select('tr'):
# flavor = soup.select_one('.a-text-bold').text # returns just Flavour word
flavor_name = result.select_one('.a-span9 .a-size-base').text
data.append({
"flavour": flavor_name
})
print(data)
# [{'flavour': 'Green Apple'}]
Access created data:
for flavour in data:
print(flavour["flavour"])
# Green Apple
P.S. There's a dedicated web scraping blog of mine. If you need to parse search engines, have a try using SerpApi.
Disclaimer, I work for SerpApi.