New to python/scraping. Trying to obtain the info for this xml (https://www.boe.es/diario_boe/xml.php?id=BOE-A-2022-2225)
<p >A) CIGARROS Y CIGARRITOS</p>
<table >
<thead>
<tr>
<th > </th>
<th >
<p >PVP</p>
<p >–</p>
<p >Euros/Unidad</p>
</th>
</tr>
</thead>
<colgroup>
<col width="60%"/>
<col width="16%"/>
</colgroup>
<tbody>
<tr>
<td colspan="2">
<em>A. FLORES</em>
</td>
</tr>
<tr>
<td >A. Flores Gran Reserva Connecticut Valley Reserve Robusto C (10).</td>
<td >12,95</td>
Im trying to obtain the td texts but when exporting the info to a excel I only get the first part of the xml ((A) CIGARROS Y CIGARRITOS). The full file has more sections ((B) CIGARS, (C)...).
This is what I got so far
table = soup.find('table', {'class':'tabla'})
columns = [i.get_text(strip=True) for i in table.find_all("th")]
data = []
for tr in table.find("tbody").find_all("tr"):
data.append([td.get_text(strip=True) for td in tr.find_all("td")])
df = pd.DataFrame(data, columns=columns)
df.to_excel("data.xlsx", index=False)
I tried with a find_all()
instead of find()
for the table but got the error: ResultSet object has no attribute 'find_all'
. Any help?
CodePudding user response:
To get all tables and avoid the error while using find_all()
you have to iterate the ResultSet.
data = []
for t in soup.find_all('table', {'class':'tabla'}):
columns = [i.get_text(strip=True) for i in t.find_all("th")]
for tr in t.find("tbody").find_all("tr"):
data.append(dict(zip(columns,[td.get_text(strip=True) for td in tr.find_all("td")])))
df = pd.DataFrame(data)
PVP–Euros/Unidad | PVP–Euros/Envase | ||
---|---|---|---|
0 | A. FLORES | nan | nan |
1 | A. Flores Gran Reserva Connecticut Valley Reserve Robusto C (10). | 12,95 | nan |
2 | AJ. FERNANDEZ | nan | nan |
3 | Aj. Fernandez New World Cameroon Selection Gordo C (20). | 9,95 | nan |
4 | Aj. Fernandez New World Oscuro Virrey Gordo 6 X 58 C (21). | 8,95 | nan |
... | ... | ... | ... |
115 | Eternal Smoke Red Lips (50 g). | 2,10 | nan |
116 | Eternal Smoke Wild Lit (50 g). | 2,10 | nan |
117 | Forever Gold Border (200 g). | 12,00 | nan |
118 | Forever Gold Border (50 g). | 3,50 | nan |
119 | Forever Gold Border Edición Limitada (50 g). | 3,50 | nan |