How can I see which list element generates a warning/error message in R?-CodePudding

I have a list of about thousand data frames and I'm applying a function to all these data frames with lapply. However, there seems to be eight list elements (data frames) which generate this warning message:

In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
  longer argument not a multiple of length of shorter

So basically I'd like to just track down the elements which generate the error, so I can make the necessary fixes to them, but don't know how. So far been just going through the data frames individually by applying a function to them and seeing whether or not the specific data frame generates the error (like this testdata <- my_function(df[[1]], "X")) but as expected, it's taking forever to go through lol.

CodePudding user response：

I typically find purrr::quietly() to be helpful when I get warnings (for errors, possibly() is better). This generates a list with the following elements per iteration:

result
output
warnings
messages

Here is a reprex on how to identify the dataframes which gives you problems:

library(purrr)

# Replace "log" with your function, and the vector with your list of dataframes
res <- 
  c(10, 20, -1) %>% 
  map(quietly(log)) # note the quietly()


# This gives you the first index where you got a warning
res %>% 
  detect_index(~length(.x$warnings) > 0)
#> [1] 3

# With this map you can find the warning of all dataframes, also those who don't 
# have any. The index will tell you where all problems are
res %>% 
  map(~.x$warnings)
#> [[1]]
#> character(0)
#> 
#> [[2]]
#> character(0)
#> 
#> [[3]]
#> [1] "NaNs produced"

# With keep you can see all results from iterations with warnings
res %>% 
  keep(~length(.x$warnings) > 0)
#> [[1]]
#> [[1]]$result
#> [1] NaN
#> 
#> [[1]]$output
#> [1] ""
#> 
#> [[1]]$warnings
#> [1] "NaNs produced"
#> 
#> [[1]]$messages
#> character(0)

^{Created on 2022-04-05 by the reprex package (v2.0.1)}

CodePudding user response：

You could try it with possibly(). E.g. if some values are not numeric, we cannot divide a number by it (hence an error). For more infos on error handling with purrr see https://aosmith.rbind.io/2020/08/31/handling-errors/

library(purrr)
library(dplyr)
my_function <- function(x) { 20/x}
find_error = possibly(.f = my_function, otherwise = NULL)

df <- 
  list(
    df1 = tibble(values =c(1,2,3)),
    df2 = tibble(values = c("1","2","3")
    ))

df %>% 
  map(find_error) %>% 
  keep(~is.null(.x))