Home > OS >  Extract class description between <td>
Extract class description between <td>

Time:08-25

I am performing some data web scrapping using Beautiful Soup in Python. How is it possible to extract the class information between <td> when there is no text provided ? See the example I am working on. I'd like Beautiful Soup to provide me the text mm_detail_N, mm_detail_N, mm_detail_SE.

<tr>
<td >Direction du vent</td>
<td><center><div  title="title.wind_N"></div></center></td>
<td><center><div  title="title.wind_N"></div></center></td>
<td><center><div  title="title.wind_SE"></div></center></td>
</tr>

I usually use the following command

data = [i.get_text(strip=True) for i in soup.find_all("td", {"title": "title_of_the_td"})]

I have tried the following commands:

data = [i.get_text(strip=True) for i in soup.find_all("div", {"title": "caption_of_the_td"})

The command executes properly but the outcome is empty

Any ideas ?

CodePudding user response:

As you mentioned above that you would like to extract mm_detail_N, mm_detail_N, mm_detail_SE. So you can select the common class attr value div[class*="mm_detail"] then invoke .get() method to pull the that value as text form as follows:

html_doc = ''''
<tr>
<td >Direction du vent</td>
<td><center><div  title="title.wind_N"></div></center></td>
<td><center><div  title="title.wind_N"></div></center></td>
<td><center><div  title="title.wind_SE"></div></center></td>
</tr>
'''

from bs4 import BeautifulSoup 
soup = BeautifulSoup(html_doc, 'lxml')

for td in soup.select('div[class*="mm_detail"]'):
    print(td.get('class'))

Output:

['mm_detail_N']
['mm_detail_N']
['mm_detail_SE']
  • Related