I am trying to extract the what's within the 'title' tag from the following html, but so far I didn't manage to.
<div title="22.12.2022 01:49:03 UTC-03:00">
This is my code:
from bs4 import BeautifulSoup
with open("messages.html") as fp:
soup = BeautifulSoup(fp, 'html.parser')
results = soup.find_all('div', attrs={'class':'pull_right date details'})
print(results)
And the output is a list with all <div for the html file.
CodePudding user response:
To access the value inside title
. Simply call ['title']
.
If you use find_all
, then this will return a list. Therefore you will need an index (e.g [0]['title']
)
For example:
from bs4 import BeautifulSoup
fp = '<html><div title="22.12.2022 01:49:03 UTC-03:00"></html>'
soup = BeautifulSoup(fp, 'html.parser')
results = soup.find_all('div', attrs={'class':'pull_right date details'})
print(results[0]['title'])
Or:
results = soup.find('div', attrs={'class':'pull_right date details'})
print(results['title'])
Output:
22.12.2022 01:49:03 UTC-03:00
22.12.2022 01:49:03 UTC-03:00