I have this kind of html code
<tr>
<td >...</td>
<td >...</td>
<td >
<p>
<sup>
Name Name Name
</sup>
</p>
</td>
<td >...</td>
<td >...</td>
<td >
<p>
<sup>25.01.1980</sup>
</p>
</td>
<td >...</td>
<td >...</td>
</tr>
<tr>...</tr>
<tr>...</tr>
I need to get the text of every 3rd and 5th td of every tr
Apparently this doesn't work:)
from bs4 import BeautifulSoup
import index
soup = BeautifulSoup(index.index_doc, 'lxml')
for i in soup.find_all('tr')[2:]:
print(i[2].text, i[4].text)
CodePudding user response:
You could use css selectors
and pseudo classe :nth-of-type()
to select your elements (assumed you need the date, so I selected the 6th td):
data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]
And to get a list of tuples
:
list(zip(data, data[1:]))
Example
from bs4 import BeautifulSoup
html = '''
<tr>
<td >...</td>
<td >...</td>
<td >
<p>
<sup>
Name Name Name
</sup>
</p>
</td>
<td >...</td>
<td >...</td>
<td >
<p>
<sup>25.01.1980</sup>
</p>
</td>
<td >...</td>
<td >...</td>
</tr>
<tr>...</tr>
<tr>...</tr>
'''
soup = BeautifulSoup(html)
data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]
list(zip(data, data[1:]))