I am trying to scrape values from a page such as this: https://www.barchart.com/futures/quotes/CBX22/options/nov-22 in R, currently using rvest
. Specifically, I want the current price and implied volatility. Using the SelectorGadget tool, I was able to find the nodes needed for these values.
Using the following, I was able to get the implied volatility:
library(rvest)
html <- read_html("https://www.barchart.com/futures/quotes/CBX22/options/nov-22")
html_text(html_nodes(html, '.text-medium-up-center strong'))
[1] "43.92%"
However, it seems the current price can't be scraped in the same way since it is a value that updates every few seconds without needing to refresh the page. Using the same process as with the implied volatility, except subbing in the node for price,
html_text(html_nodes(html, '.pricechangerow > .last-change'))
yields
[1] "[[ item.lastPrice ]]" "[[ rootItem.lastPrice ]]"
Is there a way to retrieve whatever the last price happened to be at the time the html is read?
CodePudding user response:
The value you are looking for is stored in a json string at the bottom of the html and is only written to the relevant html element by javascript in your browser after the page has loaded. rvest
won't change any of the html elements - it simply reads the html that is on the page as-is.
To extract the value, you need to parse the json and obtain the correct element from the resulting nested list:
read_html("https://www.barchart.com/futures/quotes/CBX22/options/nov-22") %>%
html_element(xpath = '//script[@id="bc-dynamic-config"]') %>%
html_text() %>%
jsonlite::parse_json() %>%
getElement('currentSymbol') %>%
getElement('lastPrice')
#> [1] "90.37"
If I run the same code a few seconds later (equivalent to refereshing the page in your browser), I get:
read_html("https://www.barchart.com/futures/quotes/CBX22/options/nov-22") %>%
html_element(xpath = '//script[@id="bc-dynamic-config"]') %>%
html_text() %>%
jsonlite::parse_json() %>%
getElement('currentSymbol') %>%
getElement('lastPrice')
#> [1] "90.47"