Home > Back-end >  Is there a way to make a dynamic web page automatically run its JavaScript when webscraping with Pyt
Is there a way to make a dynamic web page automatically run its JavaScript when webscraping with Pyt

Time:10-23

I have been getting a lot of issues when trying to do some Python webscraping using BeautifulSoup. Since this particular web page is dynamic, I have been trying to use Selenium first in order to "open" the web page before trying to work with the dynamic content with BeautifulSoup.

The issue I am getting is that the dynamic content is only showing up in my HTML output when I manually scroll through the website upon running the program, otherwise those parts of the HTML remain empty as if I was just using BeautifulSoup by itself without Selenium.

Here is my code:

import time
from bs4 import BeautifulSoup
from selenium import webdriver

if __name__ == "__main__":

    options = webdriver.ChromeOptions()
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--incognito')
    # options.add_argument('--headless')

    driver = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe", chrome_options=options)
    driver.get('https://coinmarketcap.com/')
    time.sleep(5)

    html = driver.page_source

    soup = BeautifulSoup(html, "html.parser")
    tbody = soup.tbody
    trs = tbody.contents

    for tr in trs:
        print(tr)

    driver.close()

Now if I have Selenium open Chrome with the headless option turned on, I get the same output I would normally get without having pre-loaded the page. The same thing happens if I'm not in headless mode and I simply let the page load by itself, without scrolling through the content manually. Does anyone know why this is? Is there a way to get the dynamic content to load without manually scrolling through each time I run the code?

CodePudding user response:

Actually, data is loaded dynamically by javascipt. So you can grab data easily from api calls json response:

Here is the working example:

Code:

import requests
import json

url= 'https://api.coinmarketcap.com/data-api/v3/cryptocurrency/listing?start=1&limit=100&sortBy=market_cap&sortType=desc&convert=USD,BTC,ETH&cryptoType=all&tagType=all&audited=false&aux=ath,atl,high24h,low24h,num_market_pairs,cmc_rank,date_added,max_supply,circulating_supply,total_supply,volume_7d,volume_30d'
r = requests.get(url)

for item in r.json()['data']['cryptoCurrencyList']:
    name = item['name']
    
    print('crypto_name:'    str(name)) 

Output:

crypto_name:Bitcoin
crypto_name:Ethereum
crypto_name:Binance Coin     
crypto_name:Cardano
crypto_name:Tether
crypto_name:Solana
crypto_name:XRP
crypto_name:Polkadot
crypto_name:USD Coin
crypto_name:Dogecoin
crypto_name:Terra
crypto_name:Uniswap
crypto_name:Wrapped Bitcoin  
crypto_name:Litecoin
crypto_name:Avalanche        
crypto_name:Binance USD      
crypto_name:Chainlink        
crypto_name:Bitcoin Cash     
crypto_name:Algorand
crypto_name:SHIBA INU        
crypto_name:Polygon
crypto_name:Stellar
crypto_name:VeChain
crypto_name:Internet Computer
crypto_name:Cosmos
crypto_name:FTX Token
crypto_name:Filecoin
crypto_name:Axie Infinity
crypto_name:Ethereum Classic
crypto_name:TRON
crypto_name:Bitcoin BEP2
crypto_name:Dai
crypto_name:THETA
crypto_name:Tezos
crypto_name:Fantom
crypto_name:Hedera
crypto_name:NEAR Protocol
crypto_name:Elrond
crypto_name:Monero
crypto_name:Crypto.com Coin
crypto_name:PancakeSwap
crypto_name:EOS
crypto_name:The Graph
crypto_name:Flow
crypto_name:Aave
crypto_name:Klaytn
crypto_name:IOTA
crypto_name:eCash
crypto_name:Quant
crypto_name:Bitcoin SV
crypto_name:Neo
crypto_name:Kusama
crypto_name:UNUS SED LEO
crypto_name:Waves
crypto_name:Stacks
crypto_name:TerraUSD
crypto_name:Harmony
crypto_name:Maker
crypto_name:BitTorrent
crypto_name:Celo
crypto_name:Helium
crypto_name:OMG Network
crypto_name:THORChain
crypto_name:Dash
crypto_name:Amp
crypto_name:Zcash
crypto_name:Compound
crypto_name:Chiliz
crypto_name:Arweave
crypto_name:Holo
crypto_name:Decred
crypto_name:NEM
crypto_name:Theta Fuel
crypto_name:Enjin Coin
crypto_name:Revain
crypto_name:Huobi Token
crypto_name:OKB
crypto_name:Decentraland
crypto_name:SushiSwap
crypto_name:ICON
crypto_name:XDC Network
crypto_name:Qtum
crypto_name:TrueUSD
crypto_name:yearn.finance
crypto_name:Nexo
crypto_name:Celsius
crypto_name:Bitcoin Gold
crypto_name:Curve DAO Token
crypto_name:Mina
crypto_name:KuCoin Token
crypto_name:Zilliqa
crypto_name:Perpetual Protocol
crypto_name:Ren
crypto_name:dYdX
crypto_name:Ravencoin
crypto_name:Synthetix
crypto_name:renBTC
crypto_name:Telcoin
crypto_name:Basic Attention Token
crypto_name:Horizenput:
  • Related