how to get tag value that wrapped in table?-CodePudding

<td> <label for="cp_designation">Designation : </label></td> 
                                    
                                   <td> PARTNER</td>
                                </tr>                        
                        <tr>   
                                    <td><label for="cp_category">Category : </label></td> 
                                
                                   <td>SPORTS GEARS</td>
                                </tr>
                        <tr>
                               <td> <label for="cp_address">Address : </label></td> 
                            
                               <td> A-148, WARD NO.4, PAINTER STREETSIALKOT-CANTT.</td>
                            </tr>
                        <tr>  
                               <td> <label for="cp_phone">Phone  : </label></td>
                            
                               <td> 4603886,</td>
                            </tr>
                            
soup = bs(page.content, "html.parser")
for i in soup:
  label = soup.find_all('label',text='Designation : ')
  print(label.find('tr'))

hi y'all my question is that i want to extract label value that is in tag i tried so many things but fail to get value. did you guys has any experties if yes so it would be hightly appreciatable. thanks in advance.

CodePudding user response：

Here you can find main tr tag with find_all method to iterate over label tag to get data as key-value pair and use find_next to get next tag with label tag to get values of labels

from bs4 import BeautifulSoup
soup=BeautifulSoup(html,"html.parser")
dict1={}
for i in soup.find_all("tr"):
    label=i.find("label")
    dict1[label.get_text(strip=True)]=label.find_next("td").get_text(strip=True)

Output:

{'Designation :': 'PARTNER',
 'Category :': 'SPORTS GEARS',
 'Address :': 'A-148, WARD NO.4, PAINTER STREETSIALKOT-CANTT.',
 'Phone  :': '4603886,'}

CodePudding user response：

What we do here is take a list of the headers, take a list of the table rows, and zip the headers to the data stored in the table data tag (as text), we then convert this to a dictionary and add to a list.

This isn't the best way of scraping as you can hit issues where data doesn't exist and data in the incorrect location, however with the below you can adapt it to be more robust.

soup=BeautifulSoup(html,"html.parser")

all_data = []

table = soup.find('table')
headers = [i.text for i in table.find_all('th')]
rows = table.find_all('tr')

for row in rows:
    table_data_text = [i.text for i in row.find_all('td')]
    output_dict = dict(zip(headers, table_data_text))
    all_data.append(output_dict)