Home > Mobile >  Get a value in BeautifulSoup in nested HTML tag
Get a value in BeautifulSoup in nested HTML tag

Time:07-08

I am working on a webscrappig to extract a value in a nested html tag imported from the HTML file. Here is the snippet of the HTML

    <table  cellspacing="0" id="coveragetable">
  <thead>
    <tr>
      <td  id="a" onclick="toggleSort(this)">Element</td>
      <td  id="b" onclick="toggleSort(this)">Nike</td>
      <td  id="c" onclick="toggleSort(this)">Value.</td>
      <td  id="d" onclick="toggleSort(this)">Adidas</td>
      <td  id="e" onclick="toggleSort(this)">Value.</td>
      <td  id="f" onclick="toggleSort(this)">Russia</td>
      <td  id="g" onclick="toggleSort(this)">UAE</td>
      <td  id="h" onclick="toggleSort(this)">Japan</td>
      <td  id="i" onclick="toggleSort(this)">India</td>
    </tr>
  </thead>
  <tfoot>
    <tr>
      <td>Total</td>
      <td >2323</td>
      <td >12%</td>
      <td >233</td>
      <td >61%</td>
      <td >222</td>
      <td >322</td>
      <td >233</td>
       <td >455</td>
    </tr>
  </tfoot>

I want to extract the 12% in the <td >12%</td>. I tried with the below below step and got all value under </tfoot> as <tfoot><tr><td>Total</td><td >2323</td><td >12%</td><td > 233</td>.... </tfoot>

with open('index.html', 'r') as f:

    contents = f.read()
    print(contents)

    soup = BeautifulSoup(contents, 'lxml')
    mydivs = soup.select_one("tfoot", {"class": "ctr2"})
    print("mydivs", mydivs)

Then I tried the below script and got <td >12%</td>

    mydivs = soup.select_one('td[]')
    print("mydivs", mydivs)

Let me know where I am missing and How to get the only 12% and also all the values in td. I am using Python to extract the data

CodePudding user response:

How to get the only 12% and also all the values in td. I am using Python to extract the data

Cause question is not that focused and expected output not that clear, there are a lot of ways to get the data.


Get all the values in your tfoot as list using stripped_strings:

list(soup.tfoot.stripped_strings)
#['Total', '2323', '12%', '233', '61%', '222', '322', '233', '455']

Get your explicit value while picking by index:

list(soup.tfoot.stripped_strings)[2]
#12%

Get your explicit value by css selector directly:

soup.select_one('tfoot td:nth-of-type(3)').text
#12%

or

soup.select_one('tfoot td.ctr2').text
#12%
Example
from bs4 import BeautifulSoup

html='''
  <table  cellspacing="0" id="coveragetable">
  <thead>
    <tr>
      <td  id="a" onclick="toggleSort(this)">Element</td>
      <td  id="b" onclick="toggleSort(this)">Nike</td>
      <td  id="c" onclick="toggleSort(this)">Value.</td>
      <td  id="d" onclick="toggleSort(this)">Adidas</td>
      <td  id="e" onclick="toggleSort(this)">Value.</td>
      <td  id="f" onclick="toggleSort(this)">Russia</td>
      <td  id="g" onclick="toggleSort(this)">UAE</td>
      <td  id="h" onclick="toggleSort(this)">Japan</td>
      <td  id="i" onclick="toggleSort(this)">India</td>
    </tr>
  </thead>
  <tfoot>
    <tr>
      <td>Total</td>
      <td >2323</td>
      <td >12%</td>
      <td >233</td>
      <td >61%</td>
      <td >222</td>
      <td >322</td>
      <td >233</td>
       <td >455</td>
    </tr>
  </tfoot>
</table>
'''


soup = BeautifulSoup(html)

print(list(soup.tfoot.stripped_strings))

print(list(soup.tfoot.stripped_strings)[2])
Output
['Total', '2323', '12%', '233', '61%', '222', '322', '233', '455']

12%

CodePudding user response:

You can access the text inside the div using mydivs.text

CodePudding user response:

Try this:

    mydivs = soup.select_one('td[]').text
    print("mydivs", mydivs)
  • Related