I am working on a webscrappig to extract a value in a nested html tag imported from the HTML file. Here is the snippet of the HTML
<table cellspacing="0" id="coveragetable">
<thead>
<tr>
<td id="a" onclick="toggleSort(this)">Element</td>
<td id="b" onclick="toggleSort(this)">Nike</td>
<td id="c" onclick="toggleSort(this)">Value.</td>
<td id="d" onclick="toggleSort(this)">Adidas</td>
<td id="e" onclick="toggleSort(this)">Value.</td>
<td id="f" onclick="toggleSort(this)">Russia</td>
<td id="g" onclick="toggleSort(this)">UAE</td>
<td id="h" onclick="toggleSort(this)">Japan</td>
<td id="i" onclick="toggleSort(this)">India</td>
</tr>
</thead>
<tfoot>
<tr>
<td>Total</td>
<td >2323</td>
<td >12%</td>
<td >233</td>
<td >61%</td>
<td >222</td>
<td >322</td>
<td >233</td>
<td >455</td>
</tr>
</tfoot>
I want to extract the 12% in the <td >12%</td>
. I tried with the below below step and got all value under </tfoot>
as <tfoot><tr><td>Total</td><td >2323</td><td >12%</td><td > 233</td>.... </tfoot>
with open('index.html', 'r') as f:
contents = f.read()
print(contents)
soup = BeautifulSoup(contents, 'lxml')
mydivs = soup.select_one("tfoot", {"class": "ctr2"})
print("mydivs", mydivs)
Then I tried the below script and got <td >12%</td>
mydivs = soup.select_one('td[]')
print("mydivs", mydivs)
Let me know where I am missing and How to get the only 12% and also all the values in td. I am using Python to extract the data
CodePudding user response:
How to get the only 12% and also all the values in td. I am using Python to extract the data
Cause question is not that focused and expected output not that clear, there are a lot of ways to get the data.
Get all the values in your tfoot
as list
using stripped_strings
:
list(soup.tfoot.stripped_strings)
#['Total', '2323', '12%', '233', '61%', '222', '322', '233', '455']
Get your explicit value while picking by index
:
list(soup.tfoot.stripped_strings)[2]
#12%
Get your explicit value by css selector
directly:
soup.select_one('tfoot td:nth-of-type(3)').text
#12%
or
soup.select_one('tfoot td.ctr2').text
#12%
Example
from bs4 import BeautifulSoup
html='''
<table cellspacing="0" id="coveragetable">
<thead>
<tr>
<td id="a" onclick="toggleSort(this)">Element</td>
<td id="b" onclick="toggleSort(this)">Nike</td>
<td id="c" onclick="toggleSort(this)">Value.</td>
<td id="d" onclick="toggleSort(this)">Adidas</td>
<td id="e" onclick="toggleSort(this)">Value.</td>
<td id="f" onclick="toggleSort(this)">Russia</td>
<td id="g" onclick="toggleSort(this)">UAE</td>
<td id="h" onclick="toggleSort(this)">Japan</td>
<td id="i" onclick="toggleSort(this)">India</td>
</tr>
</thead>
<tfoot>
<tr>
<td>Total</td>
<td >2323</td>
<td >12%</td>
<td >233</td>
<td >61%</td>
<td >222</td>
<td >322</td>
<td >233</td>
<td >455</td>
</tr>
</tfoot>
</table>
'''
soup = BeautifulSoup(html)
print(list(soup.tfoot.stripped_strings))
print(list(soup.tfoot.stripped_strings)[2])
Output
['Total', '2323', '12%', '233', '61%', '222', '322', '233', '455']
12%
CodePudding user response:
You can access the text inside the div
using mydivs.text
CodePudding user response:
Try this:
mydivs = soup.select_one('td[]').text
print("mydivs", mydivs)