I am trying to use Rvest to scrape a data point from:
What I am attempting to capture is the "Yield As at close 30 Apr 2022" number which is 2.53%
I have attempted this using the following code
url <- "https://www.vanguardinvestor.co.uk/investments/vanguard-ftse-developed-europe-ex-uk-ucits-etf-eur-distributing/distributions"
url_read <- url %>%
read_html()
etf_Data <- url_read %>%
html_nodes(xpath='/html/body/ukd-app/ukd-pla-nav/div[1]/ukd-fund-detail/div[2]/ukd-distributions/dl/div[2]') %>%
html_text()
however is is returning character(0).
Based on previous responses on SO I have tried to see if a passthrough query is required in the URL however my knowledge is fairly limited so have been unable to tell if it is required.
I have also tried
etf_Data <- url_read %>%
html_element('.caption:contains("Yield As at close 30 Apr 2022") .data') %>% html_text2()
and
etf_Data <- url_read %>%
html_nodes(xpath='/html/body/ukd-app/ukd-pla-nav/div[1]/ukd-fund-detail/div[2]/ukd-distributions/dl/div[2]') %>%
html_table()
with the same response.
Any help you could provide would be appreciated.
Thanks C
CodePudding user response:
The problem is, that the data is loaded dynamically to the Page using JavaScript. You could work around this using Rselenium
.
A much simpler solution is - with a slight modification of the Url - to request the data from the API:
library(httr)
resp <- GET("https://www.vanguardinvestor.co.uk/api/fund-detail/vanguard-ftse-developed-europe-ex-uk-ucits-etf-eur-distributing") %>% content()
yield <- resp$fundData$distributionHistory$yield[[1]]