Home > Software engineering >  Filter across columns in dplyr
Filter across columns in dplyr

Time:11-24

I want to filter the iris dataframe to only return rows where the value is greater than 2 in the sepal.length,sepal.width,petal.length, and petal.width fields using the filter and across functions. I have the below code:

iris%>%
  filter(across(c(Sepal.Length, Sepal.Width , Petal.Length, Petal.Width), >2))

The error message is that there is: Error: unexpected '>' in:

Can anyone suggest amendments to the code to solve this?

CodePudding user response:

Two possibilities

iris %>%
  filter(across(c(Sepal.Length, Sepal.Width , Petal.Length, Petal.Width), `>`, 2))
iris %>%
  filter(across(c(Sepal.Length, Sepal.Width , Petal.Length, Petal.Width), ~ .x > 2))

# or

iris %>%
  filter(across(c(Sepal.Length, Sepal.Width , Petal.Length, Petal.Width), function(x) x > 2))

Let's start from the second example - there we are using anonymous function notation, first one is purrr's style, the second one is, let's call is, classic style. Purrr's style works only with some packages.

And now the first one - what across() wants as a second argument is a function, but you need to use function in prefix form Advanced R. All functions in R have this form, but often it is not necessary to use it, for example:

2   2
` `(2, 2)

Is the same.

In across() when you pass (as a second argument) function, then you can pass after comma all other arguments which can be passed to this function. For > first argument is, well first number(s) - and there go values from iris, and the second argument is number 2, i.e. number you chosen to check against values in columns.

CodePudding user response:

A possible solution, based on dplyr:

library(dplyr)

iris%>%
  filter(across(is.numeric, ~ .x > 2))

Or:

iris%>%
  filter(across(c(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width), ~ .x > 2))

Or even:

iris%>%
  filter(across(ends_with(c("Length","Width")), ~ .x > 2))

CodePudding user response:

A potential solution:

iris %>% filter(Sepal.Length > 2 & Sepal.Width >2 & Petal.Length >2 & Petal.Width >2)

And the condensated version:

iris %>% filter_at(vars(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width),all_vars(.>2))

CodePudding user response:

Hi another possible you could use is since you are using variables that have similar names is

iris_filter_contain = iris %>% 
  filter(across(c(contains("Petal"), ends_with("Sepal")), ~ .x > 2))
  • Related