I have a list of about thousand data frames and I'm applying a function to all these data frames with lapply. However, there seems to be eight list elements (data frames) which generate this warning message:
In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
So basically I'd like to just track down the elements which generate the error, so I can make the necessary fixes to them, but don't know how. So far been just going through the data frames individually by applying a function to them and seeing whether or not the specific data frame generates the error (like this testdata <- my_function(df[[1]], "X")
) but as expected, it's taking forever to go through lol.
CodePudding user response:
I typically find purrr::quietly()
to be helpful when I get warnings (for errors, possibly()
is better). This generates a list with the following elements per iteration:
- result
- output
- warnings
- messages
Here is a reprex on how to identify the dataframes which gives you problems:
library(purrr)
# Replace "log" with your function, and the vector with your list of dataframes
res <-
c(10, 20, -1) %>%
map(quietly(log)) # note the quietly()
# This gives you the first index where you got a warning
res %>%
detect_index(~length(.x$warnings) > 0)
#> [1] 3
# With this map you can find the warning of all dataframes, also those who don't
# have any. The index will tell you where all problems are
res %>%
map(~.x$warnings)
#> [[1]]
#> character(0)
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> [1] "NaNs produced"
# With keep you can see all results from iterations with warnings
res %>%
keep(~length(.x$warnings) > 0)
#> [[1]]
#> [[1]]$result
#> [1] NaN
#>
#> [[1]]$output
#> [1] ""
#>
#> [[1]]$warnings
#> [1] "NaNs produced"
#>
#> [[1]]$messages
#> character(0)
Created on 2022-04-05 by the reprex package (v2.0.1)
CodePudding user response:
You could try it with possibly()
. E.g. if some values are not numeric, we cannot divide a number by it (hence an error). For more infos on error handling with purrr
see https://aosmith.rbind.io/2020/08/31/handling-errors/
library(purrr)
library(dplyr)
my_function <- function(x) { 20/x}
find_error = possibly(.f = my_function, otherwise = NULL)
df <-
list(
df1 = tibble(values =c(1,2,3)),
df2 = tibble(values = c("1","2","3")
))
df %>%
map(find_error) %>%
keep(~is.null(.x))