Home > Software engineering >  BeautifulSoup select element from html
BeautifulSoup select element from html

Time:09-16

Using bs4 I am trying to extract these values from a html page but I am not sure how to get them. How can I achieve this?

Values:

721,073.7
6.172119857741811412

html:

<hr class='hr-space'><div class='row'><div class='col-md-3 font-weight-bold font-weight-sm-normal mb-1 mb-md-0'><div class='d-flex align-items-center'><span class='mr-1' title='2 Token Transfers'><i class='fal fa-question-circle text-secondary d-none d-sm-inline-block mr-1' data-container='body' data-toggle='popover' data-placement='top' data-original-title='' title='' data-content='List of token transferred in the transaction.'></i>Tokens Transferred: </span><span class='badge badge-pill badge-secondary align-midle'>2</span></div></div><div class='col-md-9'><ul class='list-unstyled mb-0' id='wrapperContent'><li class='media align-items-baseline mb-2'><span class='row-count text-secondary small mr-1'><i class='fa fa-caret-right'></i></span><div class='media-body'><span class=''><b>From</b> </span><span class='hash-tag text-truncate  mr-1'><a href='/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0x10ed43c718714eb63d5aa57b78b54704e256024e'><span class='hash-tag text-truncate hash-tag-custom-from tooltip-address' data-toggle='tooltip' title='PancakeSwap: Router v2 (0x10ed43c718714eb63d5aa57b78b54704e256024e)'>PancakeSwap: Router v2</span></a></span><span class='mr-1'><b>To</b> </span><span class='hash-tag text-truncate '><a href='/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0xc2b6da6fdc762f1399697432da46e0cab97166ea'><span class='hash-tag text-truncate hash-tag-custom-to tooltip-address' data-toggle='tooltip' title='PancakeSwap V2: DXT 9 (0xc2b6da6fdc762f1399697432da46e0cab97166ea)'>PancakeSwap V2: DXT 9</span></a></span><span class='mr-1'> <b>For</b> </span><span class='mr-1'><span data-toggle="tooltip" data-original-title="Current Price : $432.65 / WBNB">6.172119857741811412 ($2,670.38)</span> </span> <img src='/token/images/binance_32.png' class='mt-n1 mr-1' width='15'><a href='/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c'>Wrapped BNB (WBNB)</a></div></li><li class='media align-items-baseline mb-2'><span class='row-count text-secondary small mr-1'><i class='fa fa-caret-right'></i></span><div class='media-body'><span class=''><b>From</b> </span><span class='hash-tag text-truncate  mr-1'><a href='/token/0x2b2ff80c489dad868318a19fd6f258889a026da5?a=0xc2b6da6fdc762f1399697432da46e0cab97166ea'><span class='hash-tag text-truncate hash-tag-custom-from tooltip-address' data-toggle='tooltip' title='PancakeSwap V2: DXT 9 (0xc2b6da6fdc762f1399697432da46e0cab97166ea)'>PancakeSwap V2: DXT 9</span></a></span><span class='mr-1'><b>To</b> </span><span class='hash-tag text-truncate '><a href='/token/0x2b2ff80c489dad868318a19fd6f258889a026da5?a=0x1453ce6f063231bb062152b804531d3c3e0a8240'><span class='hash-tag text-truncate hash-tag-custom-to tooltip-address' data-toggle='tooltip' title='0x1453ce6f063231bb062152b804531d3c3e0a8240'>0x1453ce6f063231bb062152b804531d3c3e0a8240</span></a></span><span class='mr-1'> <b>For</b> </span><span class='mr-1'>721,073.7 </span><img src='/images/main/empty-token.png' class='mt-n1 mr-1' width='15'><a href='/token/0x2b2ff80c489dad868318a19fd6f258889a026da5'>DEXIT (DXT) </a></div></li></ul></div></div>

my code:

import requests
from bs4 import BeautifulSoup


test_url = 'https://bscscan.com/tx/0x0467727ae96fc0226bc876f0d3aa2332f1da3cec9599e52c683c74f60ec01409'
page = requests.get(test_url)

soup = BeautifulSoup(page.content, "html.parser")
ul = soup.find_all('div', {'class': 'col-md-3 font-weight-bold font-weight-sm-normal mb-1 mb-md-0'})

CodePudding user response:

  1. I have used css selector for finding li element which return 2 list of data

  2. You can iterate over that the price value will be inside span tag and it will be the last element so from find_all method we can extract that data

data=soup.select("li.media.align-items-baseline.mb-2")
for i in data:
    print(i.find_all("span",class_="mr-1")[-1].get_text().split(" ")[0])

Output:

6.172119857741811412
721,073.7
  • Related