Home > Net >  How to extract specific part of html using Beautifulsoup?
How to extract specific part of html using Beautifulsoup?

Time:12-31

I am trying to extract the what's within the 'title' tag from the following html, but so far I didn't manage to.

<div  title="22.12.2022 01:49:03 UTC-03:00">

This is my code:

from bs4 import BeautifulSoup

with open("messages.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')

results = soup.find_all('div', attrs={'class':'pull_right date details'})

print(results)

And the output is a list with all <div for the html file.

CodePudding user response:

To access the value inside title. Simply call ['title'].

If you use find_all, then this will return a list. Therefore you will need an index (e.g [0]['title'])

For example:

from bs4 import BeautifulSoup

fp = '<html><div  title="22.12.2022 01:49:03 UTC-03:00"></html>'
soup = BeautifulSoup(fp, 'html.parser')

results = soup.find_all('div', attrs={'class':'pull_right date details'})

print(results[0]['title'])

Or:

results = soup.find('div', attrs={'class':'pull_right date details'})

print(results['title'])

Output:

22.12.2022 01:49:03 UTC-03:00
22.12.2022 01:49:03 UTC-03:00
  • Related