Home > database >  How to get a text of certain elements BeautifulSoup Python
How to get a text of certain elements BeautifulSoup Python

Time:01-31

I have this kind of html code

<tr>
  <td >...</td>
  <td >...</td>
  <td >
    <p>
      <sup>
        Name Name Name
      </sup>
    </p>
  </td>
  <td >...</td>
  <td >...</td>
  <td >
    <p>
      <sup>25.01.1980</sup>
    </p>
  </td>
  <td >...</td>
  <td >...</td>
</tr>
<tr>...</tr>
<tr>...</tr>

I need to get the text of every 3rd and 5th td of every tr

Apparently this doesn't work:)

from bs4 import BeautifulSoup
import index

soup = BeautifulSoup(index.index_doc, 'lxml')

for i in soup.find_all('tr')[2:]:
    print(i[2].text, i[4].text)

CodePudding user response:

You could use css selectors and pseudo classe :nth-of-type() to select your elements (assumed you need the date, so I selected the 6th td):

data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]

And to get a list of tuples:

list(zip(data, data[1:]))

Example

from bs4 import BeautifulSoup

html = '''
<tr>
  <td >...</td>
  <td >...</td>
  <td >
    <p>
      <sup>
        Name Name Name
      </sup>
    </p>
  </td>
  <td >...</td>
  <td >...</td>
  <td >
    <p>
      <sup>25.01.1980</sup>
    </p>
  </td>
  <td >...</td>
  <td >...</td>
</tr>
<tr>...</tr>
<tr>...</tr>
'''
soup = BeautifulSoup(html)

data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]

list(zip(data, data[1:]))
  • Related