Home > other >  Grabbing links inside the td
Grabbing links inside the td

Time:04-05

The script below is working but I wanted to add the href link of the item to produce a better data output. Any help will do. Thank you.

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}
url = "https://bscscan.com/token/generic-tokenholders2?m=normal&a=0x0D0b63b32595957ae58D4dD60aa5409E79A5Aa96"

s = requests.Session()
r = s.get(url,headers=headers, timeout=5)
soupblockdetails = BeautifulSoup(r.content, 'html.parser')

for row in soupblockdetails.select("tr:has(td)")[:3]:  #max value is 50
   item1 = row.find_all("td")[0].text[0:].strip()
   item2 = row.find_all("td")[1].text[0:].strip()
   item3 = row.find_all("td")[2].text[0:].strip()
   print ("{:<2} {:<43}   {:>25}".format(item1, item2, item3))   

Current Output:

1  KIPS: Locked Wallet                            1,870.828693386970691791
2  0xe72d1910c07420a99a2649f40910f692cd87309e         6.849012043043023775
3  0x138fe04c8f7da181765bde237ef5e78546677f5f         2.153134069327832213

Needed Output:

1  KIPS: Locked Wallet                            1,870.828693386970691791      0x81e0ef68e103ee65002d3cf766240ed1c070334d      
2  0xe72d1910c07420a99a2649f40910f692cd87309e         6.849012043043023775      0xe72d1910c07420a99a2649f40910f692cd87309e      
3  0x138fe04c8f7da181765bde237ef5e78546677f5f         2.153134069327832213      0x138fe04c8f7da181765bde237ef5e78546677f5f

CodePudding user response:

Call the <a> from your second <td> and use .get('href') to extract the href value - Getting only the parameter value, simply split the url:

item4 = row.find_all("td")[1].a.get('href').split('a=')[-1]

In your loop:

for row in soupblockdetails.select("tr:has(td)")[:3]:  #max value is 50
    item1 = row.find_all("td")[0].text[0:].strip()
    item2 = row.find_all("td")[1].text[0:].strip()
    item3 = row.find_all("td")[2].text[0:].strip()
    item4 = row.find_all("td")[1].a.get('href').split('a=')[-1]
    print ("{:<2} {:<43}   {:>25} {}".format(item1, item2, item3, item4))

Output

1  KIPS: Locked Wallet                            1,870.828693386970691791 0x81e0ef68e103ee65002d3cf766240ed1c070334d
2  0xe72d1910c07420a99a2649f40910f692cd87309e         6.849012043043023775 0xe72d1910c07420a99a2649f40910f692cd87309e
3  0x138fe04c8f7da181765bde237ef5e78546677f5f         2.153134069327832213 0x138fe04c8f7da181765bde237ef5e78546677f5f
  • Related