I am performing some data web scrapping using Beautiful Soup
in Python. How is it possible to extract the class information between <td>
when there is no text provided ? See the example I am working on. I'd like Beautiful Soup to provide me the text mm_detail_N
, mm_detail_N
, mm_detail_SE
.
<tr>
<td >Direction du vent</td>
<td><center><div title="title.wind_N"></div></center></td>
<td><center><div title="title.wind_N"></div></center></td>
<td><center><div title="title.wind_SE"></div></center></td>
</tr>
I usually use the following command
data = [i.get_text(strip=True) for i in soup.find_all("td", {"title": "title_of_the_td"})]
I have tried the following commands:
data = [i.get_text(strip=True) for i in soup.find_all("div", {"title": "caption_of_the_td"})
The command executes properly but the outcome is empty
Any ideas ?
CodePudding user response:
As you mentioned above that you would like to extract mm_detail_N, mm_detail_N, mm_detail_SE
. So you can select the common class attr value div[class*="mm_detail"]
then invoke .get() method to pull the that value as text form as follows:
html_doc = ''''
<tr>
<td >Direction du vent</td>
<td><center><div title="title.wind_N"></div></center></td>
<td><center><div title="title.wind_N"></div></center></td>
<td><center><div title="title.wind_SE"></div></center></td>
</tr>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'lxml')
for td in soup.select('div[class*="mm_detail"]'):
print(td.get('class'))
Output:
['mm_detail_N']
['mm_detail_N']
['mm_detail_SE']