How to find all tags that include tags with certain class? The data is:
<tr>
<td width=17%>Tournament</td>
<td width=8%>Date</td>
<td width=6%>Pts.</td>
<td width=34%>Pos. Player (team)</td>
<td width=35%>Pos. Opponent (team)</td>
</tr>
<tr>
<td class=TDq1><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>
<td class=TDq2><a href="p.pl?t=410&r=4">17.02.02</a></td>
<td class=TDq3>34/75</td>
<td class=TDq5>39. John Deep</td>
<td class=TDq9>68. <a href="p.pl?ply=1229">Mark Deep</a></td>
</tr>
<tr>
<td class=TDp1><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>
<td class=TDp2><a href="p.pl?t=410&r=4">17.02.02</a></td>
<td class=TDp3>34/75</td>
<td class=TDp6>39. John Deep</td>
<td class=TDp8>7. <a href="p.pl?ply=10">Darius Star</a></td>
</tr>
I am trying
for mtable in bs.find_all('tr', text=re.compile(r'class=TD?3')):
print(mtable)
but this returns zero results.
CodePudding user response:
You need to find matching with td
. Like this,
In [1]: bs.find_all('td', {"class": re.compile(r'TD\w\d')})
Out[1]:
[<td width="17%">Tournament</td>,
<td width="8%">Date</td>,
<td width="6%">Pts.</td>,
<td width="34%">Pos. Player (team)</td>,
<td width="35%">Pos. Opponent (team)</td>,
<td ><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>,
<td ><a href="p.pl?t=410&r=4">17.02.02</a></td>,
<td >34/75</td>,
<td >39. John Deep</td>,
<td >68. <a href="p.pl?ply=1229">Mark Deep</a></td>,
<td ><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>,
<td ><a href="p.pl?t=410&r=4">17.02.02</a></td>,
<td >34/75</td>,
<td >39. John Deep</td>,
<td >7. <a href="p.pl?ply=10">Darius Star</a></td>]
CodePudding user response:
I suppose you want to find all <tr>
that contains any tag with class TD<any character>3
:
import re
# `html` contains your html from the question
soup = BeautifulSoup(html, "html.parser")
pat = re.compile(r"TD.3")
for tr in soup.find_all(
lambda tag: tag.name == "tr"
and tag.find(class_=lambda cl: cl and pat.match(cl))
):
print(tr)
Prints:
<tr>
<td ><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>
<td ><a href="p.pl?t=410&r=4">17.02.02</a></td>
<td >34/75</td>
<td >39. John Deep</td>
<td >68. <a href="p.pl?ply=1229">Mark Deep</a></td>
</tr>
<tr>
<td ><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>
<td ><a href="p.pl?t=410&r=4">17.02.02</a></td>
<td >34/75</td>
<td >39. John Deep</td>
<td >7. <a href="p.pl?ply=10">Darius Star</a></td>
</tr>
CodePudding user response:
This may help you:
from bs4 import BeautifulSoup
import re
t = 'your page source'
pat = re.compile(r'class=TD.3')
classes = re.findall(pat,t)
classes = [j[6:] for j in classes]
soup = BeautifulSoup(t)
result = list()
for i in classes:
item = soup.find_all(attrs={"class": i})
result.extend(item)
for i in result:
print(i.parent)