Web scraping an element using beautifulSoup and Python-CodePudding

I am trying to grab an element from tradingview.com. Specifically this link. I want the price of a symbol of whatever link I give my program. I noticed when looking through the elements of the url, I can find the price of the stock here.

<div >
    "3.065"
    <span class>57851</span>
</div>

When running this code below, I get this output.

#This will not run on online IDE
import requests
from bs4 import BeautifulSoup
  
URL = "https://www.tradingview.com/symbols/NEARUSD/"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser') # If this line causes an error, run 'pip install html5lib' or install html5lib
L = [soup.find_all(class_ = "tv-symbol-price-quote__value js-symbol-last")] 
print(L)

output

[[<div ></div>]]

How can I grab the entire price from this website? I would like the 3.065 as well as the 57851.

CodePudding user response：

You have the most common problem: page uses JavaScript to add/update elements but BeautifulSoup/lxml, requests/urllib can't run JS. You may need Selenium to control real web browser which can run JS. OR use (manually) DevTools in Firefox/Chrome (tab Network) to see if JavaScript reads data from some URL. And try to use this URL with requests. JS usually gets JSON which can be easy converted to Python dictionary (without BS). You can also check if page has (free) API for programmers.

Using DevTool I found it uses JavaScript to send POST (with some JSON data) and it gets fresh price.

import requests

payload = {
    "columns": ["market_cap_calc", "market_cap_diluted_calc", "total_shares_outstanding", "total_shares_diluted", "total_value_traded"],
    "range": [0, 1],
    "symbols": {"tickers": ["BINANCE:NEARUSD"]}
}

url = 'https://scanner.tradingview.com/crypto/scan'

response = requests.post(url, json=payload)
print(response.text)

data = response.json()
print(data['data'][0]["d"][1]/1_000_000_000)

Result:

{"totalCount":1,"data":[{"s":"BINANCE:NEARUSD","d":[2507704855.0467912,3087555230,812197570,1000000000,106737372.9550421]}]}

3.08755523

EDIT:

It seems above code gives only market cap. And page uses websocket to get fresh price every few seconds.

wss://data.tradingview.com/socket.io/websocket?from=symbols/NEARUSD/&date=2022_10_17-11_33

And this would need more complex code.

Other answer (with Selenium) gives you correct value.

CodePudding user response：

The webpage's contents are loaded dynamically by JavaScript. So you have to use an automation tool something like selenium or hidden API.

Here I use selenium with bs4 to grab the desired dynamic content.

import time
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.service import Service

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url= "https://www.tradingview.com/symbols/NEARUSD/"
driver.get(url)   
driver.maximize_window()
time.sleep(5)

soup = BeautifulSoup(driver.page_source,"lxml")

price = soup.find('div',class_ = "tv-symbol-price-quote__value js-symbol-last").get_text(strip=True)
print(price)

Output:

3.07525163