I had to read Dollar Rates table for each Bank from https://kursdollar.org
, and I had to test this Snippets in several times:
library(stringr)
library(tidyverse)
library(rvest)
library(httr)
library(RCurl)
curlSetOpt(timeout = 200)
kurs_bi <- "https://kursdollar.org/bank/bi.php"
kurs_mandiri <- "https://kursdollar.org/bank/mandiri.php"
kurs_bca <- "https://kursdollar.org/bank/bca.php"
kurs_bni <- "https://kursdollar.org/bank/bni.php"
kurs_hsbc <- "https://kursdollar.org/bank/hsbc.php"
kurs_panin <- "https://kursdollar.org/bank/panin.php"
kurs_cimb <- "https://kursdollar.org/bank/cimb.php"
kurs_ocbc <- "https://kursdollar.org/bank/ocbc.php"
kurs_bri <- "https://kursdollar.org/bank/bri.php"
kurs_uob <- "https://kursdollar.org/bank/uob.php"
kurs_maybank <- 'https://kursdollar.org/bank/maybank.php'
kurs_permata <- "https://kursdollar.org/bank/permata.php"
kurs_mega <- "https://kursdollar.org/bank/mega.php"
kurs_danamon <- "https://kursdollar.org/bank/danamon.php"
kurs_btn <- "https://kursdollar.org/bank/btn.php"
kurs_mayapada <- "https://kursdollar.org/bank/mayapada.php"
kurs_muamalat <- "https://kursdollar.org/bank/muamalat.php"
kurs_bukopin <- "https://kursdollar.org/bank/bukopin.php"
link_kurs <- c(kurs_bi, kurs_mandiri, kurs_bca, kurs_bni, kurs_hsbc, kurs_panin,
kurs_cimb, kurs_ocbc, kurs_bri, kurs_uob, kurs_maybank, kurs_permata, kurs_mega,
kurs_danamon, kurs_btn, kurs_mayapada, kurs_muamalat, kurs_bukopin)
for(v in 1:length(link_kurs)){
writeLines(paste0(v,') Read Table on ', link_kurs[v]))
open_url <- url(link_kurs[v], "rb")
extract_df <- read_html(open_url)
close(open_url)
extract_df <- extract_df %>%
html_nodes("table") %>%
html_table(fill = T) %>% as.data.frame()
writeLines("Test Read Success!")
}
The result might differ when run by several times, It is fast when the reading is Successful, but sometimes it Stucks to read a certain Link (Timeout Limiting from RCurl didn't work) and throws:
Error in url(link_kurs[v], "rb") : cannot open the connection
In addition: Warning message:
In url(link_kurs[v], "rb") :
InternetOpenUrl failed: 'The operation timed out'
Anyway to bypass this? is there a way to read all those tables consistently even if its a little slow?
CodePudding user response:
Try using tryCatch
for(v in 1:length(link_kurs)){
writeLines(paste0(v,') Read Table on ', link_kurs[v]))
open_url <- url(link_kurs[v], "rb")
tryCatch({
extract_df <- read_html(open_url)
close(open_url)
extract_df <- extract_df %>%
html_nodes("table") %>%
html_table(fill = T) %>% as.data.frame()
writeLines("Test Read Success!")
}, error=function(e) NULL)
}
Completed version of tryCatch
and Loop to retry catch the table an infinite attempts (OP Edit)
for(v in 1:length(link_kurs)){
writeLines(paste0(v,') Read Table on ', link_kurs[v]))
while(TRUE){
tryCatch({
open_url <- url(link_kurs[v], "rb")
extract_df <- read_html(open_url)
close(open_url)
extract_df <- extract_df %>%
html_nodes("table") %>%
html_table(fill = T) %>% as.data.frame()
extract_df_list <- c(extract_df_list, list(extract_df))
writeLines("Test Read Success!")
break
}, error=function(e){
message("Test Read Timeout")
message("Retrying. .")
})
}
}