Grabbing some data from a card-body div class-CodePudding

Good day. My script is on progress and I need help or ideas to make it work properly. I am able to grab some data but its not really that readable and useful and your help and ideas are needed.

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}
url = "https://bscscan.com/tx/0xb07b68f72f0b58e8cfb8c8e896736f49b13775ebda25301475d24554a601ff97#eventlog"

urlpage = requests.get(url, headers=headers, timeout=10, allow_redirects=False)
soup = BeautifulSoup(urlpage.content, 'html.parser')
price = soup.find('div', class_='d-none d-md-inline-block u-label u-label--price rounded mt-1 ml-n1 text-nowrap').get_text()#.strip()
print ("Price: ", price)


data1 = soup.find('div', class_='card-body').get_text()#.strip()
print (data1)


data2 = soup.find('span', class_='btn btn-icon btn-soft-success rounded-circle').get_text()#.strip() 
print (data2)

Current Output:

Price:  
BNB: $422.35 (-3.05%) |  5 Gwei

Transaction Hash:
0xb07b68f72f0b58e8cfb8c8e896736f49b13775ebda25301475d24554a601ff97
Status:Success

Squeezed text (173 lines).
206

Wanted Output:

Price:  
BNB: $422.35 (-3.05%) |  5 Gwei

207 #-- latest data

Address: 0x81e0ef68e103ee65002d3cf766240ed1c070334d
Topics:  0 0x598cd56214a374d15f638dd04913e0288cd76c7833ee66b15cf55845d875a187
Data
0000000000000000000000000000000000000000000000000000000061b23bae
00000000000000000000000000000000000000000000000000000000979144b0

CodePudding user response：

Alternative which caters for always picking up latest transaction (if more transactions added). Because JavaScript doesn't run with requests content isn't as it appears on webpage. You need to target the element with id myTabContent.

I've attempted broadly to go with hopefully more stable selector lists and avoid some of the potentially less robust classes.

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://bscscan.com/tx/0xb07b68f72f0b58e8cfb8c8e896736f49b13775ebda25301475d24554a601ff97#eventlog', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')

#select price info
price = soup.select_one('#ethPrice').get_text(' ', strip = True)

# select latest event
last_transaction = soup.select_one('#myTabContent div.media:nth-last-child(2)')
latest_number = int(last_transaction.select_one('.btn-icon__inner').text)
address = last_transaction.select_one('a.text-break').text
topic = last_transaction.select_one('li > .text-break').text

print('Price:', price)
print('Latest number:', latest_number)
print('Address:', address)
print('Topics:', topic)
print('Data')
for data in last_transaction.select('[id^=chunk].text-break'):
    print(data.text)

CodePudding user response：

Actually,selecting all data according to requirement a little bit complex.I apply css selector,however, you also can apply find_all/find method.

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}
url = "https://bscscan.com/tx/0xb07b68f72f0b58e8cfb8c8e896736f49b13775ebda25301475d24554a601ff97#eventlog"

urlpage = requests.get(url, headers=headers, timeout=10, allow_redirects=False)
soup = BeautifulSoup(urlpage.content, 'html.parser')
price = soup.find('div', class_='d-none d-md-inline-block u-label u-label--price rounded mt-1 ml-n1 text-nowrap').get_text()#.strip()
print ("Price: ", price)


for card in soup.select('div.media')[1:2]:
    num=card.select_one('[]').text
    print(num)
    address=card.select_one('[] a').text
    print(address)
    topic=card.select_one('[]').text
    print(topic)
    data1=card.select_one('#chunk_2_4').text
    print(data1)

    data2=card.select_one('#chunk_2_5').text
    print(data2)

Output:

Price:  
BNB: $422.41 (-3.65%) |  5 Gwei

207
0x81e0ef68e103ee65002d3cf766240ed1c070334d
0x598cd56214a374d15f638dd04913e0288cd76c7833ee66b15cf55845d875a187
0000000000000000000000000000000000000000000000000000000061b23bae
00000000000000000000000000000000000000000000000000000000979144b0

It's working. The problem was data2=card.select_one('#chunk_2_5') not exist so you are getting None type error but everything is okey:

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}
url = "https://bscscan.com/tx/0x173c462e910c95a67c119c61566330a835e4785221e247fada6d2279052519f1#eventlog"

urlpage = requests.get(url, headers=headers, timeout=10, allow_redirects=False)
soup = BeautifulSoup(urlpage.content, 'html.parser')
price = soup.find('div', class_='d-none d-md-inline-block u-label u-label--price rounded mt-1 ml-n1 text-nowrap').get_text()#.strip()
print ("Price: ", price)


for card in soup.select('div.media')[1:2]:
    num=card.select_one('[]').text
    print(num)
    address=card.select_one('[] a').text
    print(address)
    topic=card.select_one('[]').text
    print(topic)
    data1=card.select_one('#chunk_2_4').text
    print(data1)

    # data2=card.select_one('#chunk_2_5').text
    # print(data2)

Output:

Price:
BNB: $422.25 (-3.15%) |  5 Gwei

315
0x7ee058420e5937496f5a2096f04caa7721cf70cc
0x694af1cc8727cdd0afbdd53d9b87b69248bd490224e9dd090e788546506e076f
0000000000000000000000000000000000000000000000000000000062e6b858