Home > Mobile >  getting an attribute from a tag with beautiful soup
getting an attribute from a tag with beautiful soup

Time:04-18

I am trying to get the attribute 'datetime' but cant seem to do it right for the past couple of hours:

    driver.get("https://cointelegraph.com/tags/bitcoin")
    time.sleep(10)
   
    page_source = driver.page_source
    soup = BeautifulSoup(page_source, 'html.parser')
    #print(soup.prettify())

    articles = soup.find_all("article")
 
    for article in articles:
        print("--------------------------------")
        if article.has_attr('datetime'):
            print(article['datetime'])
   else:
        print('no attribute present')

I execute this and its seems that said attribute is not there:

--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------
no attribute present
--------------------------------

I checked the HTML and the 'datetime' attribute is there within the 'article' tag. But it looks like it only has one attribute which is 'class'.

<article  data-v-a5013924="">
 <a  href="/news/top-5-cryptocurrencies-to-watch-this-week-btc-xrp-link-bch-fil">
  <figure >
   <div >
    <span >
     <span >
     </span>
    </span>
    <!-- -->
    <img alt="Top 5 cryptocurrencies to watch this week: BTC, XRP, LINK, BCH, FIL"  pinger-seen="true" src="https://images.cointelegraph.com/images/370_aHR0cHM6Ly9zMy5jb2ludGVsZWdyYXBoLmNvbS91cGxvYWRzLzIwMjItMDQvYWJlMzJhMjYtMmMwMi00ODczLTllNGUtYWQ2ZTdmMzEzOGNlLmpwZw==.jpg" srcset="https://images.cointelegraph.com/images/370_aHR0cHM6Ly9zMy5jb2ludGVsZWdyYXBoLmNvbS91cGxvYWRzLzIwMjItMDQvYWJlMzJhMjYtMmMwMi00ODczLTllNGUtYWQ2ZTdmMzEzOGNlLmpwZw==.jpg
1x, https://images.cointelegraph.com/images/740_aHR0cHM6Ly9zMy5jb2ludGVsZWdyYXBoLmNvbS91cGxvYWRzLzIwMjItMDQvYWJlMzJhMjYtMmMwMi00ODczLTllNGUtYWQ2ZTdmMzEzOGNlLmpwZw==.jpg 2x"/>
   </div>
   <span >
    Price Analysis
   </span>
  </figure>
 </a>
 <div >
  <div >
   <a  href="/news/top-5-cryptocurrencies-to-watch-this-week-btc-xrp-link-bch-fil">
    <span >
     Top 5 cryptocurrencies to watch this week: BTC, XRP, LINK, BCH, FIL
    </span>
   </a>
   <div >
    <time  datetime="2022-04-17">
     4 hours ago
    </time>
    <p User-Agent":"mozila/5.0"}
url='https://cointelegraph.com/tags/bitcoin'

req= requests.get(url,headers=headers)

soup = BeautifulSoup(req.content,'html.parser')

for dt in soup.select('time.post-card-inline__date'):
    date_time =dt.get('datetime')
    print(date_time)

Output:

2022-04-17
2022-04-17
2022-04-17
2022-04-16
2022-04-16
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-15
2022-04-14

CodePudding user response:

The problem that datetime is not a attribute of the article tag. So you need to make a further research to find those tags with that attribute, findall(datetime=True) and you can access to its value without problems.

...
articles = soup.find_all("article")

for article in articles:
    for time_tag in article.findall(datetime=True):
        print(time_tag[datetime])
  • Related