Home > Back-end >  purrr safely over a function and save the links which are giving errors
purrr safely over a function and save the links which are giving errors

Time:04-16

I have some links:

myLinks = c("https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l", 
"https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l", 
"https://www.fotocasa.es/es/comprar/viviendas/vilafranca-del-penedes/todas-las-zonas/l", 
"https://www.fotocasa.es/es/comprar/viviendas/vilobi-del-penedes/todas-las-zonas/l"
)

One of the links is returning an error but I don't want to remove it, I just want to store the link so I can inspect it further.

Data/Code:

library(RSelenium)
library(rvest)
library(tidyverse)

rD <- rsDriver(browser="firefox", port=4536L)
remDr <- rD[["client"]]

collectZonaLinkData <- function(zona_url_to_get){
  
  remDr$navigate(zona_url_to_get)
  #click on Distrito
  remDr$findElement(using = "xpath", '/html/body/div[1]/div[2]/div[1]/div[3]/div/div[1]/div')$clickElement()
  html_zona_full_page = remDr$getPageSource()[[1]] %>% 
    read_html()
  
  Zonas_Names = html_zona_full_page %>% 
    html_nodes('.re-GeographicSearchNext-checkboxItem.is-checked') %>% # only interested in the checked name boxes
    html_nodes('.re-GeographicSearchNext-checkboxItem-literal') %>% 
    html_text()
  
  Zonas_Link  = html_zona_full_page %>% 
    html_nodes('.re-GeographicSearchNext-checkboxItem.is-checked') %>% 
    html_attr('href') %>% 
    paste("https://www.fotocasa.es", ., sep = "")
  
  zonas = cbind.data.frame(Zonas_Names, Zonas_Link)
  return(zonas)
}

I can run the following:

out = map(myLinks, ~ collectZonaLinkData(.x)) %>% 
  set_names(myLinks) %>% 
  bind_rows(.id = "ID")

Whic gives the following error:

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1

The problem url is the following:

collectZonaLinkData(myLinks[3])

How can I wrap the collectZonaLinkData inside safely and make sure that the Zonas_Link contains an NA in the data frame.

i.e. running the following:

myLinks = myLinks[1:2]
out = map(myLinks, ~ collectZonaLinkData(.x)) %>% 
  set_names(myLinks) %>% 
  bind_rows(.id = "ID")

Gives me the output for 2 links which work:

                                                                                ID       Zonas_Names
1        https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l        Torrelavit
2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l Torrelles de Foix
                                                                        Zonas_Link
1        https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l
2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l

The 3rd link doesn't work so it can collect the ID but for Zonas_Names and Zonas_Link I would like an NA in these columns.

I am not sure if I should wrap the safely() function around the Zonas_Names and Zonas_Links inside the collectZonaLinkData?

Expected output:

                                                                                            ID       Zonas_Names
            1        https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l        Torrelavit
            2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l Torrelles de Foix
            3 https://www.fotocasa.es/es/comprar/viviendas/vilafranca-del-penedes/todas-las-zonas/l
NA
                                                                                    Zonas_Link
            1        https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l
            2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l
            3 NA

EDIT:

Ouput:

# A tibble: 1 × 4
  `https://www.fotocasa.… `https://www.fotocasa.es… $Zonas_Link         `https://www.fotocasa.e… `https://www.fotocasa.e… $Zonas_Link       
  <lgl>                   <fct>                     <fct>               <lgl>                    <fct>                    <fct>             
1 NA                      Torrelles de Foix         https://www.fotoca… NA                       Vilobí del Penedès       https://www.fotoc…

CodePudding user response:

We can wrap the function as input to possibly or safely

pcollectZonaLinkData <- possibly(collectZonaLinkData, 
   otherwise = tibble(ID = NA_character_, 
    Zonas_Names = NA_character_, Zonas_link = NA_character_))

and then use this function in map

library(purrr)
library(dplyr)
out <- map(myLinks, ~ pcollectZonaLinkData(.x)) %>% 
  set_names(myLinks) %>% 
  bind_rows(.id = "ID")
  • Related