Good day. My script is on progress and I need help or ideas to make it work properly. I am able to grab some data but its not really that readable and useful and your help and ideas are needed.
from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}
url = "https://bscscan.com/tx/0xb07b68f72f0b58e8cfb8c8e896736f49b13775ebda25301475d24554a601ff97#eventlog"
urlpage = requests.get(url, headers=headers, timeout=10, allow_redirects=False)
soup = BeautifulSoup(urlpage.content, 'html.parser')
price = soup.find('div', class_='d-none d-md-inline-block u-label u-label--price rounded mt-1 ml-n1 text-nowrap').get_text()#.strip()
print ("Price: ", price)
data1 = soup.find('div', class_='card-body').get_text()#.strip()
print (data1)
data2 = soup.find('span', class_='btn btn-icon btn-soft-success rounded-circle').get_text()#.strip()
print (data2)
Current Output:
Price:
BNB: $422.35 (-3.05%) | 5 Gwei
Transaction Hash:
0xb07b68f72f0b58e8cfb8c8e896736f49b13775ebda25301475d24554a601ff97
Status:Success
Squeezed text (173 lines).
206
Wanted Output:
Price:
BNB: $422.35 (-3.05%) | 5 Gwei
207 #-- latest data
Address: 0x81e0ef68e103ee65002d3cf766240ed1c070334d
Topics: 0 0x598cd56214a374d15f638dd04913e0288cd76c7833ee66b15cf55845d875a187
Data
0000000000000000000000000000000000000000000000000000000061b23bae
00000000000000000000000000000000000000000000000000000000979144b0
CodePudding user response:
Alternative which caters for always picking up latest transaction (if more transactions added). Because JavaScript doesn't run with requests
content isn't as it appears on webpage. You need to target the element with id myTabContent
.
I've attempted broadly to go with hopefully more stable selector lists and avoid some of the potentially less robust classes.
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://bscscan.com/tx/0xb07b68f72f0b58e8cfb8c8e896736f49b13775ebda25301475d24554a601ff97#eventlog', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
#select price info
price = soup.select_one('#ethPrice').get_text(' ', strip = True)
# select latest event
last_transaction = soup.select_one('#myTabContent div.media:nth-last-child(2)')
latest_number = int(last_transaction.select_one('.btn-icon__inner').text)
address = last_transaction.select_one('a.text-break').text
topic = last_transaction.select_one('li > .text-break').text
print('Price:', price)
print('Latest number:', latest_number)
print('Address:', address)
print('Topics:', topic)
print('Data')
for data in last_transaction.select('[id^=chunk].text-break'):
print(data.text)
CodePudding user response:
Actually,selecting all data according to requirement a little bit complex.I apply css selector,however, you also can apply find_all/find method.
from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}
url = "https://bscscan.com/tx/0xb07b68f72f0b58e8cfb8c8e896736f49b13775ebda25301475d24554a601ff97#eventlog"
urlpage = requests.get(url, headers=headers, timeout=10, allow_redirects=False)
soup = BeautifulSoup(urlpage.content, 'html.parser')
price = soup.find('div', class_='d-none d-md-inline-block u-label u-label--price rounded mt-1 ml-n1 text-nowrap').get_text()#.strip()
print ("Price: ", price)
for card in soup.select('div.media')[1:2]:
num=card.select_one('[]').text
print(num)
address=card.select_one('[] a').text
print(address)
topic=card.select_one('[]').text
print(topic)
data1=card.select_one('#chunk_2_4').text
print(data1)
data2=card.select_one('#chunk_2_5').text
print(data2)
Output:
Price:
BNB: $422.41 (-3.65%) | 5 Gwei
207
0x81e0ef68e103ee65002d3cf766240ed1c070334d
0x598cd56214a374d15f638dd04913e0288cd76c7833ee66b15cf55845d875a187
0000000000000000000000000000000000000000000000000000000061b23bae
00000000000000000000000000000000000000000000000000000000979144b0
It's working. The problem was data2=card.select_one('#chunk_2_5')
not exist so you are getting None type error but everything is okey:
from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}
url = "https://bscscan.com/tx/0x173c462e910c95a67c119c61566330a835e4785221e247fada6d2279052519f1#eventlog"
urlpage = requests.get(url, headers=headers, timeout=10, allow_redirects=False)
soup = BeautifulSoup(urlpage.content, 'html.parser')
price = soup.find('div', class_='d-none d-md-inline-block u-label u-label--price rounded mt-1 ml-n1 text-nowrap').get_text()#.strip()
print ("Price: ", price)
for card in soup.select('div.media')[1:2]:
num=card.select_one('[]').text
print(num)
address=card.select_one('[] a').text
print(address)
topic=card.select_one('[]').text
print(topic)
data1=card.select_one('#chunk_2_4').text
print(data1)
# data2=card.select_one('#chunk_2_5').text
# print(data2)
Output:
Price:
BNB: $422.25 (-3.15%) | 5 Gwei
315
0x7ee058420e5937496f5a2096f04caa7721cf70cc
0x694af1cc8727cdd0afbdd53d9b87b69248bd490224e9dd090e788546506e076f
0000000000000000000000000000000000000000000000000000000062e6b858