Home > Blockchain >  Using any_of inside of filter R
Using any_of inside of filter R

Time:01-05

I'm trying to use an an any_of() within a filter() to handle variable names that may or may not be in a dataframe when I run a function across it. When doing this however I run across the error that any_of() must be used within a *selecting* function. Is it possible to create something that can non-specificity or do I need to create a method to name the expected column explicitly?

I've made a quick little example that shows the issue and would be very interested in any work arounds or suggestions.

data("iris")

iris %>%
  mutate(Sepal.Area = Sepal.Length*Sepal.Width,
         Petal.Area = Petal.Length*Petal.Width) %>%
  filter(if_all(starts_with("Sepal"), ~.>4),
         any_of(c("Petal.Area", "Petal.Diameter"), ~.>2))

CodePudding user response:

We may have to wrap with if_all over the any_of or use matches

iris %>%
  mutate(Sepal.Area = Sepal.Length*Sepal.Width,
         Petal.Area = Petal.Length*Petal.Width)  %>% 
  filter(if_all(starts_with("Sepal"), ~ .x > 4),   
          if_all(matches("Petal.Area|Petal.Diameter"), ~ .x  > 2))

OR may need

iris %>%
    filter(if_all(any_of(c("Petal.Length", "hello")), ~ .x > 2), 
        if_all(starts_with("Sepal"), ~ .x > 4))

These are two different cases - any_of selects only the columns that are found in the dataset without returning an error if there are some columns not found, whereas if_all loops over the columns selected and returns TRUE for a row only if all the columns under selection returns TRUE based on the condition (if_any - returns TRUE if any of the columns selected are TRUE). e.g.

> d1 <- data.frame(col1 = 1:3, col2 = -1:1, col3 = 2:4)
# col4 is not found
> d1 %>%
    filter(if_all(any_of(c("col1", "col2", "col4")), ~ .x > 0))
  col1 col2 col3
1    3    1    4

> d1 %>%
   filter(if_any(c("col1", "col2", "col4"), ~ .x > 0))
Error in `filter()`:
! Problem while expanding `..1 = if_any(c("col1", "col2", "col4"), ~.x > 0)`.
Caused by error in `if_any()`:
! Can't select columns that don't exist.
✖ Column `col4` doesn't exist.

> d1 %>%
   filter(if_any(c("col1", "col2", "col3"), ~ .x > 0))
  col1 col2 col3
1    1   -1    2
2    2    0    3
3    3    1    4
  • Related