I am parsing data from multiple links. But some of those links got broken after a while. And when I parse using rvest
package it shows an error or warning. What can I do to continue parsing with for-loop, so it moves to the next line.
house_link <- "https://lalafo.kg/bishkek/ads/104-seria-2-komnaty-47-kv-m-s-mebelu-kondicioner-zivotnye-ne-prozivali-id-95221626"
house_features = data.frame()
for(x in 1:length(house_link)) {
tryCatch({
page_data = read_html(house_link[x])
message("Executed.")
}, error = function(e){
message('Caught an error!')
print(e)
}, warning = function(w){
message('Caught an warning!')
print(w)
}, finally = {
message('All done, quitting.')
}
)
pricing = page_data %>% html_nodes(".css-13sm4s4") %>%
html_element("span") %>% html_text()
house_features = rbind(house_features, data.frame(pricing, stringsAsFactors = FALSE))
}
CodePudding user response:
Maybe something like this?
library(rvest)
house_link <- "https://lalafo.kg/bishkek/ads/104-seria-2-komnaty-47-kv-m-s-mebelu-kondicioner-zivotnye-ne-prozivali-id-95221626"
house_features = data.frame()
for(x in 1:3) { # seq_along(house_link) <- if you have more than 1 link this is the correct method
cat('Link', x)
start_time <- Sys.time()
if (x %% 200 == 0) {
Sys.sleep(5)
print("pausing ...")}
page_data <- tryCatch({
page_data = read_html(house_link[x])
message("Executed.")
}, error = function(e){
message('\nCaught an error!')
return(NA) # here a return variable for testing is returned in the error condition - notice that this has to be initiated with the return function
}, finally = {cat('Continuing with', x 1,'\n')}) #; next()}) <- disabled next()
## This part is handled by finally next()
############################
if(is.na(page_data)){ #
cat('this is a test\n') #
next() #
} #
############################
else{ # else is not strictly necessary but the point may be easier to contextualised like this
pricing = page_data %>% html_nodes(".css-13sm4s4") %>%
html_element("span") %>% html_text()
house_features = rbind(house_features, data.frame(pricing, stringsAsFactors = FALSE))
}
}
Link 1
Caught an error!
Continuing with 2
this is a test
Link 2
Caught an error!
Continuing with 3
this is a test
Link 3
Caught an error!
Continuing with 4
this is a test