Home > Enterprise >  beautiful soup parse published date of article on medium python
beautiful soup parse published date of article on medium python

Time:11-06

I need to parse "published date" of article on medium using beautiful soup. I successfully parsed in loop author, title, reading time, but for some reason "published date" is not working for me.

here is example:

https://medium.com/interlay/archive/2020

so output of prasing will be Jun 18, 2020 ; Mar 5 , 2020 ; Feb 23, 2020 etc.

CodePudding user response:

The Date is present inside the <time> tag of every article <div>.

Select that <time> tag and print it's text.

Here is the code.

import requests
from bs4 import BeautifulSoup
url = 'https://medium.com/interlay/archive/2020'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

t = [x.text.strip() for x in soup.find_all('time')]
print(t)
['Jun 18, 2020', 'Mar 4, 2020', 'Feb 23, 2020', 'Nov 30, 2020', 'Apr 15, 2020', 'Aug 21, 2020', 'Oct 27, 2020']

CodePudding user response:

import requests
from bs4 import BeautifulSoup
    
url='https://medium.com/interlay/archive/2020'
    
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

You can find main_div tag as from its class and loop over it to get data from time tag

main_div=soup.find_all("div",class_="streamItem streamItem--postPreview js-streamItem")
for div in main_div:
    print(div.find("time").text)

Output:

Jun 18, 2020
Mar 4, 2020
Feb 23, 2020
Nov 30, 2020
Apr 15, 2020
Aug 21, 2020
Oct 27, 2020

  • Related