I am using RCurl to scrape Sentiment data, but I need to make it wait several seconds first before it scraped, this is my initial code:
library(stringr)
library(curl)
links <- "https://www.dailyfx.com/sentiment"
con <- curl(links)
open(con)
html_string <- readLines(con, n = 3000)
html_string[1580:1700] #The data value property is "--" in this case
How to add the waiting seconds properly?
CodePudding user response:
Sys.sleep(1)
It will wait 1 second before resuming the program
CodePudding user response:
Special thanks for @MrFlick appointing the situations
curl will only pull the source code for that web page. The data that is shown on that page is loaded via javascript after the page loads; it is not contained in the page source. If you want to interact with a page that uses javascript, you'll need to use something like RSelenium instead. Or you'll need to reverse engineer the javascript to see where the data is coming from and then perhaps make a curl request to the data endpoint directly rather than the HTML page
With that said, I use RSelenium to accomplish this to a desired way:
library(RSelenium)
library(rvest)
library(tidyverse)
library(stringr)
rD <- rsDriver(browser="chrome", verbose=F, chromever = "103.0.5060.134")
remDr <- rD[["client"]]
remDr$navigate("https://www.dailyfx.com/sentiment")
Sys.sleep(10) #Give the page fully loaded
html <- remDr$getPageSource()[[1]]
html_obj <- read_html(html)
#Take Buy and Sell Sentiment of Specific Assets
buy_sentiment <- html_obj %>%
html_nodes(".dfx-technicalSentimentCard__netLongContainer") %>%
html_children()
buy_sentiment <- as.character(buy_sentiment[[15]])
buy_sentiment <- as.numeric(str_match(buy_sentiment, "[0-9] "))
sell_sentiment <- html_obj %>%
html_nodes(".dfx-technicalSentimentCard__netShortContainer") %>%
html_children()
sell_sentiment <- as.character(sell_sentiment[[15]])
sell_sentiment <- as.numeric(str_match(sell_sentiment, "[0-9] "))