Home > Blockchain >  How using two-part key for scarp html?
How using two-part key for scarp html?

Time:06-28

For example I have the next html code:

   ...   
        <tr  data-year="Month">...</tr>
        <tr  data-year="Month">...</tr>
        <tr  data-year="Month">...</tr>
        ...
        <tr  data-year="Month">...</tr>
        
          <td  title="" data-x-key="name">June</td>
        
          <td  title="" data-x-key="volume">100</td>
        
          <td  title="" data-x-key="date">06/27/2022</td>
    
        </tr>
        ...
        <tr  data-year="Month">...</tr>
    ...

and i have parsing code but I want to change it and my question is how can use the -> data-x-key and to not use duplicates -> find_next('td', class_='month')

    ...
        soup = BeautifulSoup(html, 'html.parser')
        item = soup.find_all('tr', class_='main')
        data = []
        for i in item:     
            data.append({
                        'name': i.find('td', class_='month').get_text(),
                        'volume': i.find('td', class_='month').find_next('td', class_='month').get_text(),
                        'date': i.find('td', class_='month').find_next('td', class_='month').find_next('td', 
                                class_='month').get_text()
                        })
        print(data)    
...

CodePudding user response:

Try with CSS selectors

html='''
<tr  data-year="Month">
    <td  title="" data-x-key="name">June</td>
        
    <td  title="" data-x-key="volume">100</td>
        
    <td  title="" data-x-key="date">06/27/2022</td>
    
    </tr>
        
'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
item = soup.find_all('tr', class_='main')
#print(item)
data = []
for i in item:
    data.append({
        'name': i.select_one('td[data-x-key="name"]').get_text(),
        'volume':  i.select_one('td[data-x-key="volume"]').get_text(),
        'date': i.select_one('td[data-x-key="date"]').get_text()})
print(data)  

Output:

[{'name': 'June', 'volume': '100', 'date': '06/27/2022'}]
  • Related