I am trying to get the attribute 'datetime' but cant seem to do it right for the past couple of hours:
driver.get("https://cointelegraph.com/tags/bitcoin")
time.sleep(10)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
#print(soup.prettify())
articles = soup.find_all("article")
for article in articles:
print("--------------------------------")
if article.has_attr('datetime'):
print(article['datetime'])
else:
print('no attribute present')
I execute this and its seems that said attribute is not there:
--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------
I checked the HTML and the 'datetime' attribute is there within the 'article' tag. But it looks like it only has one attribute which is 'class'.
<article data-v-a5013924="">
<a href="/news/top-5-cryptocurrencies-to-watch-this-week-btc-xrp-link-bch-fil">
<figure >
<div >
<span >
<span >
</span>
</span>
<!-- -->
<img alt="Top 5 cryptocurrencies to watch this week: BTC, XRP, LINK, BCH, FIL" pinger-seen="true" src="https://images.cointelegraph.com/images/370_aHR0cHM6Ly9zMy5jb2ludGVsZWdyYXBoLmNvbS91cGxvYWRzLzIwMjItMDQvYWJlMzJhMjYtMmMwMi00ODczLTllNGUtYWQ2ZTdmMzEzOGNlLmpwZw==.jpg" srcset="https://images.cointelegraph.com/images/370_aHR0cHM6Ly9zMy5jb2ludGVsZWdyYXBoLmNvbS91cGxvYWRzLzIwMjItMDQvYWJlMzJhMjYtMmMwMi00ODczLTllNGUtYWQ2ZTdmMzEzOGNlLmpwZw==.jpg
1x, https://images.cointelegraph.com/images/740_aHR0cHM6Ly9zMy5jb2ludGVsZWdyYXBoLmNvbS91cGxvYWRzLzIwMjItMDQvYWJlMzJhMjYtMmMwMi00ODczLTllNGUtYWQ2ZTdmMzEzOGNlLmpwZw==.jpg 2x"/>
</div>
<span >
Price Analysis
</span>
</figure>
</a>
<div >
<div >
<a href="/news/top-5-cryptocurrencies-to-watch-this-week-btc-xrp-link-bch-fil">
<span >
Top 5 cryptocurrencies to watch this week: BTC, XRP, LINK, BCH, FIL
</span>
</a>
<div >
<time datetime="2022-04-17">
4 hours ago
</time>
<p User-Agent":"mozila/5.0"}
url='https://cointelegraph.com/tags/bitcoin'
req= requests.get(url,headers=headers)
soup = BeautifulSoup(req.content,'html.parser')
for dt in soup.select('time.post-card-inline__date'):
date_time =dt.get('datetime')
print(date_time)
Output:
2022-04-17
2022-04-17
2022-04-17
2022-04-16
2022-04-16
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-14
CodePudding user response:
The problem that datetime
is not a attribute of the article
tag. So you need to make a further research to find those tags with that attribute, findall(datetime=True)
and you can access to its value without problems.
...
articles = soup.find_all("article")
for article in articles:
for time_tag in article.findall(datetime=True):
print(time_tag[datetime])