I need to parse "published date" of article on medium using beautiful soup. I successfully parsed in loop author, title, reading time, but for some reason "published date" is not working for me.
here is example:
https://medium.com/interlay/archive/2020
so output of prasing will be Jun 18, 2020 ; Mar 5 , 2020 ; Feb 23, 2020 etc.
CodePudding user response:
The Date is present inside the <time>
tag of every article <div>
.
Select that <time>
tag and print it's text.
Here is the code.
import requests
from bs4 import BeautifulSoup
url = 'https://medium.com/interlay/archive/2020'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
t = [x.text.strip() for x in soup.find_all('time')]
print(t)
['Jun 18, 2020', 'Mar 4, 2020', 'Feb 23, 2020', 'Nov 30, 2020', 'Apr 15, 2020', 'Aug 21, 2020', 'Oct 27, 2020']
CodePudding user response:
import requests
from bs4 import BeautifulSoup
url='https://medium.com/interlay/archive/2020'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
You can find main_div
tag as from its class and loop over it to get data from time
tag
main_div=soup.find_all("div",class_="streamItem streamItem--postPreview js-streamItem")
for div in main_div:
print(div.find("time").text)
Output:
Jun 18, 2020
Mar 4, 2020
Feb 23, 2020
Nov 30, 2020
Apr 15, 2020
Aug 21, 2020
Oct 27, 2020