Home > Back-end >  How to extract datetime from chunk of HTML
How to extract datetime from chunk of HTML

Time:01-09

I have a piece of HTML that includes a datetime like this

<time datetime="2023-01-06 05:00:00" data-format="article-display" data-show-date="always" data-show-time="today-only" data-timestamp="1672981200" itemprop="datePublished"  full-date="05.01.2023">6th January</time>

I've used the copy JS from Chrome inspector and had this returned

#article > div.mar-article > div > div.mar-article__timestamp > time

def extract_time(data):
    """Extract the time from the HTML of the article page."""
    soup = BeautifulSoup(data, 'html.parser')
    # Use the select_one() method to find the time element
    time_element = soup.find("time", class_="datetime")
    print(time_element)
    return time_element

Why does it return None?

I'm confused as I don't know how to return just the datetime.

CodePudding user response:

The element do not have a class called datetime but you could select it by its attribute datetime (provided that the corresponding element is also present in the soup):

soup.select_one('time[datetime]').get('datetime')

Example

from bs4 import BeautifulSoup
soup = BeautifulSoup('<time datetime="2023-01-06 05:00:00" data-format="article-display" data-show-date="always" data-show-time="today-only" data-timestamp="1672981200" itemprop="datePublished"  full-date="05.01.2023">6th January</time>')

soup.select_one('time[datetime]').get('datetime')

Output

2023-01-06 05:00:00
  • Related