<td> <label for="cp_designation">Designation : </label></td>
<td> PARTNER</td>
</tr>
<tr>
<td><label for="cp_category">Category : </label></td>
<td>SPORTS GEARS</td>
</tr>
<tr>
<td> <label for="cp_address">Address : </label></td>
<td> A-148, WARD NO.4, PAINTER STREETSIALKOT-CANTT.</td>
</tr>
<tr>
<td> <label for="cp_phone">Phone : </label></td>
<td> 4603886,</td>
</tr>
soup = bs(page.content, "html.parser")
for i in soup:
label = soup.find_all('label',text='Designation : ')
print(label.find('tr'))
hi y'all my question is that i want to extract label value that is in tag i tried so many things but fail to get value. did you guys has any experties if yes so it would be hightly appreciatable. thanks in advance.
CodePudding user response:
Here you can find main tr
tag with find_all
method to iterate over label
tag to get data as key-value
pair and use find_next
to get next tag with label
tag to get values of labels
from bs4 import BeautifulSoup
soup=BeautifulSoup(html,"html.parser")
dict1={}
for i in soup.find_all("tr"):
label=i.find("label")
dict1[label.get_text(strip=True)]=label.find_next("td").get_text(strip=True)
Output:
{'Designation :': 'PARTNER',
'Category :': 'SPORTS GEARS',
'Address :': 'A-148, WARD NO.4, PAINTER STREETSIALKOT-CANTT.',
'Phone :': '4603886,'}
CodePudding user response:
What we do here is take a list of the headers, take a list of the table rows, and zip the headers to the data stored in the table data tag (as text), we then convert this to a dictionary and add to a list.
This isn't the best way of scraping as you can hit issues where data doesn't exist and data in the incorrect location, however with the below you can adapt it to be more robust.
soup=BeautifulSoup(html,"html.parser")
all_data = []
table = soup.find('table')
headers = [i.text for i in table.find_all('th')]
rows = table.find_all('tr')
for row in rows:
table_data_text = [i.text for i in row.find_all('td')]
output_dict = dict(zip(headers, table_data_text))
all_data.append(output_dict)