Home > other >  Error handling function not working within dplyr::mutate
Error handling function not working within dplyr::mutate

Time:11-20

Why is R complaining about an error when my function already handles errors?

I've created a function to grab the parent element of an href attribute which invariably is "<a>". The function has some error handling to return NA if it can't find the href attribute.

The function works just fine in isolation, but not in combination with dplyr::mutate. I can't figure out why that is.

Minimal reproducible example:

# Create html doc
html.test <- "<a href=\"hello\"</a><a id=\"ctl00_ctl00_btnSearch\" data-action=\"search\" class=\"go\" href=\"javascript:__doPostBack('ctl00$ctl00$btnSearch','')\"><span>GO</span><i class=\"fal fa-search\"></i></a>" %>%
  minimal_html()

# Create function
fun.get.node.name <- function(href.target){
  # treat warnings as errors
  options(warn=2)  
  
  xpath <- paste0("//a/@href[.= \'", href.target, "\']/..")
  
  res <- try({
    node_name <- html_nodes(x = html.test, xpath = xpath) %>% html_name()
  }, silent = TRUE)
  
  if (inherits(res, "try-error")) {
    # print warnings as they occur
    options(warn=1)  
    return(NA)
  } else {
  # print warnings as they occur
  options(warn=1)
  return(node_name)
  }
}

Now, if I apply the function to the attribute href = "hello", it works fine both in isolation and when applied within dplyr::mutate:

href.target <- "hello"
fun.get.node.name(href.target)
[1] "a"

data.frame(href = href.target) %>% mutate(node_name = fun.get.node.name(href.target = href))
   href node_name
1 hello         a

But, if I apply the same function to the attribute href = "javascript:__doPostBack('ctl00$ctl00$btnSearch','')" (which for some reason can't be found) then the function works only in isolation and NOT when applied within dplyr::mutate:

href.target <- "javascript:__doPostBack('ctl00$ctl00$btnSearch','')"
fun.get.node.name(href.target)
[1] NA

data.frame(href = href.target) %>% mutate(node_name = fun.get.node.name(href.target = href))
 Error: (converted from warning) Problem while computing `node_name = fun.get.node.name(href.target = href)`.
ℹ Invalid predicate [1206] 

Why is R complaining about an error when the function already handles errors?

CodePudding user response:

Your function handles errors correctly, but the error message that pops up says it has been converted from a warning. So your function should suppressWarnings as well and then it will work as expected.

Although this solves your problem, it is still not clear why the warning is thrown inside the mutate(), but not outside of it.

library(dplyr)
library(rvest)


# Create html doc
html.test <- "<a href=\"hello\"</a><a id=\"ctl00_ctl00_btnSearch\" data-action=\"search\" class=\"go\" href=\"javascript:__doPostBack('ctl00$ctl00$btnSearch','')\"><span>GO</span><i class=\"fal fa-search\"></i></a>" %>%
  minimal_html()

# Create function
fun.get.node.name <- function(href.target){
  # treat warnings as errors
  options(warn=2)  
  
  xpath <- paste0("//a/@href[.= \'", href.target, "\']/..")
  
  res <- try({
    node_name <- suppressWarnings(
      html_nodes(x = html.test, xpath = xpath) %>% html_name()
    )
  }, silent = TRUE)
  
  if (inherits(res, "try-error")) {
    # print warnings as they occur
    options(warn=1)  
    return(NA)
  } else {
    # print warnings as they occur
    options(warn=1)
    return(node_name)
  }
}

href.target <- "javascript:__doPostBack('ctl00$ctl00$btnSearch','')"
fun.get.node.name(href.target)
#> [1] NA

data.frame(href = href.target) %>%
  mutate(node_name = fun.get.node.name(href.target = href))
#>                                                      href node_name
#> 1 javascript:__doPostBack('ctl00$ctl00$btnSearch','')        NA

Created on 2022-11-19 with reprex v2.0.2

CodePudding user response:

Using the insight provided by @TimTeaFan I have a solution where I take advantage of the fact that suppressWarnings() will return an empty character vector when the code within cannot find the href. So I don't need to go down the error handling try path...

# Create function
fun.get.node.name <- function(href.target){

  xpath <- paste0("//a/@href[.= \'", href.target, "\']/..")
  
  node_name <- suppressWarnings(
    html_nodes(x = html.test, xpath = xpath) %>% html_name()
  )
  
  if (length(node_name) == 0){
    return(NA)
  } else {
    return(node_name)
  }
}

# Run
data.frame(href = href.target) %>% mutate(node_name = fun.get.node.name(href.target = href))

#> href                                                           node_name
#> javascript:__doPostBack('ctl00$ctl00$btnSearch','')        NA

  • Related