I have some links:
myLinks = c("https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l",
"https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l",
"https://www.fotocasa.es/es/comprar/viviendas/vilafranca-del-penedes/todas-las-zonas/l",
"https://www.fotocasa.es/es/comprar/viviendas/vilobi-del-penedes/todas-las-zonas/l"
)
One of the links is returning an error but I don't want to remove it, I just want to store the link so I can inspect it further.
Data/Code:
library(RSelenium)
library(rvest)
library(tidyverse)
rD <- rsDriver(browser="firefox", port=4536L)
remDr <- rD[["client"]]
collectZonaLinkData <- function(zona_url_to_get){
remDr$navigate(zona_url_to_get)
#click on Distrito
remDr$findElement(using = "xpath", '/html/body/div[1]/div[2]/div[1]/div[3]/div/div[1]/div')$clickElement()
html_zona_full_page = remDr$getPageSource()[[1]] %>%
read_html()
Zonas_Names = html_zona_full_page %>%
html_nodes('.re-GeographicSearchNext-checkboxItem.is-checked') %>% # only interested in the checked name boxes
html_nodes('.re-GeographicSearchNext-checkboxItem-literal') %>%
html_text()
Zonas_Link = html_zona_full_page %>%
html_nodes('.re-GeographicSearchNext-checkboxItem.is-checked') %>%
html_attr('href') %>%
paste("https://www.fotocasa.es", ., sep = "")
zonas = cbind.data.frame(Zonas_Names, Zonas_Link)
return(zonas)
}
I can run the following:
out = map(myLinks, ~ collectZonaLinkData(.x)) %>%
set_names(myLinks) %>%
bind_rows(.id = "ID")
Whic gives the following error:
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1
The problem url is the following:
collectZonaLinkData(myLinks[3])
How can I wrap the collectZonaLinkData
inside safely
and make sure that the Zonas_Link
contains an NA
in the data frame.
i.e. running the following:
myLinks = myLinks[1:2]
out = map(myLinks, ~ collectZonaLinkData(.x)) %>%
set_names(myLinks) %>%
bind_rows(.id = "ID")
Gives me the output for 2 links which work:
ID Zonas_Names
1 https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l Torrelavit
2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l Torrelles de Foix
Zonas_Link
1 https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l
2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l
The 3rd link doesn't work so it can collect the ID
but for Zonas_Names
and Zonas_Link
I would like an NA
in these columns.
I am not sure if I should wrap the safely()
function around the Zonas_Names
and Zonas_Links
inside the collectZonaLinkData
?
Expected output:
ID Zonas_Names
1 https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l Torrelavit
2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l Torrelles de Foix
3 https://www.fotocasa.es/es/comprar/viviendas/vilafranca-del-penedes/todas-las-zonas/l
NA
Zonas_Link
1 https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l
2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l
3 NA
EDIT:
Ouput:
# A tibble: 1 × 4
`https://www.fotocasa.… `https://www.fotocasa.es… $Zonas_Link `https://www.fotocasa.e… `https://www.fotocasa.e… $Zonas_Link
<lgl> <fct> <fct> <lgl> <fct> <fct>
1 NA Torrelles de Foix https://www.fotoca… NA Vilobí del Penedès https://www.fotoc…
CodePudding user response:
We can wrap the function as input to possibly
or safely
pcollectZonaLinkData <- possibly(collectZonaLinkData,
otherwise = tibble(ID = NA_character_,
Zonas_Names = NA_character_, Zonas_link = NA_character_))
and then use this function in map
library(purrr)
library(dplyr)
out <- map(myLinks, ~ pcollectZonaLinkData(.x)) %>%
set_names(myLinks) %>%
bind_rows(.id = "ID")