Home > Blockchain >  BeautifulSoup not finding all tables in xml
BeautifulSoup not finding all tables in xml

Time:02-16

New to python/scraping. Trying to obtain the info for this xml (https://www.boe.es/diario_boe/xml.php?id=BOE-A-2022-2225)

<p >A) CIGARROS Y CIGARRITOS</p>
<table >
<thead>
<tr>
<th > </th>
<th >
<p >PVP</p>
<p >–</p>
<p >Euros/Unidad</p>
</th>
</tr>
</thead>
<colgroup>
<col width="60%"/>
<col width="16%"/>
</colgroup>
<tbody>
<tr>
<td  colspan="2">
<em>A. FLORES</em>
</td>
</tr>
<tr>
<td >A. Flores Gran Reserva Connecticut Valley Reserve Robusto C (10).</td>
<td >12,95</td>

Im trying to obtain the td texts but when exporting the info to a excel I only get the first part of the xml ((A) CIGARROS Y CIGARRITOS). The full file has more sections ((B) CIGARS, (C)...).

This is what I got so far

table = soup.find('table', {'class':'tabla'})
columns = [i.get_text(strip=True) for i in table.find_all("th")]
data = []

for tr in table.find("tbody").find_all("tr"):
    data.append([td.get_text(strip=True) for td in tr.find_all("td")])

df = pd.DataFrame(data, columns=columns)

df.to_excel("data.xlsx", index=False)

I tried with a find_all() instead of find() for the table but got the error: ResultSet object has no attribute 'find_all'. Any help?

CodePudding user response:

To get all tables and avoid the error while using find_all() you have to iterate the ResultSet.

data = []
for t in soup.find_all('table', {'class':'tabla'}):
    columns = [i.get_text(strip=True) for i in t.find_all("th")]

    for tr in t.find("tbody").find_all("tr"):
        data.append(dict(zip(columns,[td.get_text(strip=True) for td in tr.find_all("td")])))

df = pd.DataFrame(data)
PVP–Euros/Unidad PVP–Euros/Envase
0 A. FLORES nan nan
1 A. Flores Gran Reserva Connecticut Valley Reserve Robusto C (10). 12,95 nan
2 AJ. FERNANDEZ nan nan
3 Aj. Fernandez New World Cameroon Selection Gordo C (20). 9,95 nan
4 Aj. Fernandez New World Oscuro Virrey Gordo 6 X 58 C (21). 8,95 nan
... ... ... ...
115 Eternal Smoke Red Lips (50 g). 2,10 nan
116 Eternal Smoke Wild Lit (50 g). 2,10 nan
117 Forever Gold Border (200 g). 12,00 nan
118 Forever Gold Border (50 g). 3,50 nan
119 Forever Gold Border Edición Limitada (50 g). 3,50 nan
  • Related