BeautifulSoup to find a HTML tag that contains tags with specific class-CodePudding

How to find all tags that include tags with certain class? The data is:

<tr>
<td  width=17%>Tournament</td>
<td  width=8%>Date</td>
<td  width=6%>Pts.</td>
<td  width=34%>Pos. Player (team)</td>
<td  width=35%>Pos. Opponent (team)</td>
</tr>

<tr>
<td class=TDq1><a href="p.pl?t=410">GpWl(op)&nbsp;4.01/02</a></td>
<td class=TDq2><a href="p.pl?t=410&r=4">17.02.02</a></td>
<td class=TDq3>34/75</td>
<td class=TDq5>39. John Deep</td>
<td class=TDq9>68. <a href="p.pl?ply=1229">Mark Deep</a></td>
</tr>

<tr>
<td class=TDp1><a href="p.pl?t=410">GpWl(op)&nbsp;4.01/02</a></td>
<td class=TDp2><a href="p.pl?t=410&r=4">17.02.02</a></td>
<td class=TDp3>34/75</td>
<td class=TDp6>39. John Deep</td>
<td class=TDp8>7. <a href="p.pl?ply=10">Darius Star</a></td>
</tr>

I am trying

for mtable in bs.find_all('tr', text=re.compile(r'class=TD?3')):
print(mtable)

but this returns zero results.

CodePudding user response：

You need to find matching with td. Like this,

In [1]: bs.find_all('td', {"class": re.compile(r'TD\w\d')})
Out[1]: 
[<td  width="17%">Tournament</td>,
 <td  width="8%">Date</td>,
 <td  width="6%">Pts.</td>,
 <td  width="34%">Pos. Player (team)</td>,
 <td  width="35%">Pos. Opponent (team)</td>,
 <td ><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>,
 <td ><a href="p.pl?t=410&amp;r=4">17.02.02</a></td>,
 <td >34/75</td>,
 <td >39. John Deep</td>,
 <td >68. <a href="p.pl?ply=1229">Mark Deep</a></td>,
 <td ><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>,
 <td ><a href="p.pl?t=410&amp;r=4">17.02.02</a></td>,
 <td >34/75</td>,
 <td >39. John Deep</td>,
 <td >7. <a href="p.pl?ply=10">Darius Star</a></td>]

CodePudding user response：

I suppose you want to find all <tr> that contains any tag with class TD<any character>3:

import re

# `html` contains your html from the question
soup = BeautifulSoup(html, "html.parser")
pat = re.compile(r"TD.3")

for tr in soup.find_all(
    lambda tag: tag.name == "tr"
    and tag.find(class_=lambda cl: cl and pat.match(cl))
):
    print(tr)

Prints:

<tr>
<td ><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>
<td ><a href="p.pl?t=410&amp;r=4">17.02.02</a></td>
<td >34/75</td>
<td >39. John Deep</td>
<td >68. <a href="p.pl?ply=1229">Mark Deep</a></td>
</tr>
<tr>
<td ><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>
<td ><a href="p.pl?t=410&amp;r=4">17.02.02</a></td>
<td >34/75</td>
<td >39. John Deep</td>
<td >7. <a href="p.pl?ply=10">Darius Star</a></td>
</tr>

CodePudding user response：

This may help you:

from bs4 import BeautifulSoup
import re

t = 'your page source' 
pat = re.compile(r'class=TD.3')
classes = re.findall(pat,t)
classes = [j[6:] for j in classes]
soup = BeautifulSoup(t)
result = list()
for i in classes:
    item = soup.find_all(attrs={"class": i})
    result.extend(item)
for i in result:
    print(i.parent)