Home > OS >  I have an issue related to scraping date from a website using python and Beautifulsoup like there it
I have an issue related to scraping date from a website using python and Beautifulsoup like there it

Time:10-07

I have an issue related to scraping date from a website using python and Beautifulsoup like there I am facing the splitting issue where .split('.', "") is not working on scraping only date from this p tag <p >Oct 24, 2017 • 4 min read</p> Actually I don't want this dot and 4 min read from this p tag

Published_Date = soup.select_one('p[]').get('datetime')

CodePudding user response:

  1. The bold big dot is different that . dot you are using in split() method.

  2. So replace the bold big dot with a symbol and split that symbol and take the first value using list slicing

Example:

from bs4 import BeautifulSoup

html ='''
<p >Oct 24, 2017 • 4 min read</p>

'''

soup = BeautifulSoup(html,'html.parser')

date = soup.select_one('p.text-xs').get_text(strip=True)
print(date.replace('•','|').split('|')[0])

Output:

Oct 24, 2017
  • Related