Home > Mobile >  Add a column count if values in multiple column meet threshold conditions: R
Add a column count if values in multiple column meet threshold conditions: R

Time:01-17

Consider iris dataset. Let's say I want to create a column count if values "sepal" columns are between 1 to 5.

Here's what I have:

iris %>% rowwise() %>%
  mutate(count = sum(if_any(contains("sepal", ignore.case = TRUE), 
                        .fns = ~ between(.x, 1, 5)))) %>%
  arrange(desc(count))

But the output is not what I want.

Sepal.Length Sepal.Width Petal.Length Petal.Width Species count
          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <int>
 1          5.1         3.5          1.4         0.2 setosa      1 # Should be 1
 2          4.9         3            1.4         0.2 setosa      1 # Should be 2
 3          4.7         3.2          1.3         0.2 setosa      1 # Should be 2
 4          4.6         3.1          1.5         0.2 setosa      1 # Should be 2
 5          5           3.6          1.4         0.2 setosa      1 # Should be 2
 6          5.4         3.9          1.7         0.4 setosa      1 # Should be 1
 7          4.6         3.4          1.4         0.3 setosa      1 # Should be 2
 8          5           3.4          1.5         0.2 setosa      1 # Should be 2
 9          4.4         2.9          1.4         0.2 setosa      1 # Should be 2
10          4.9         3.1          1.5         0.1 setosa      1 # Should be 2

I can use case_when or if_else for the two columns but the actual dataset has a lot more columns. So I'm looking for a dplyr solution where I don't have to type out all the columns.

CodePudding user response:

library(tidyverse)

iris %>% 
  mutate(
    count = rowSums(across(contains("Sepal"), ~ between(.x, 1, 5)))
  )

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species count
1            5.1         3.5          1.4         0.2     setosa     1
2            4.9         3.0          1.4         0.2     setosa     2
3            4.7         3.2          1.3         0.2     setosa     2
4            4.6         3.1          1.5         0.2     setosa     2
5            5.0         3.6          1.4         0.2     setosa     2
6            5.4         3.9          1.7         0.4     setosa     1
7            4.6         3.4          1.4         0.3     setosa     2
8            5.0         3.4          1.5         0.2     setosa     2
9            4.4         2.9          1.4         0.2     setosa     2
10           4.9         3.1          1.5         0.1     setosa     2

EDIT:

With c_across. To my understanding, c_across has to be used with rowwise() to perform rowwise aggregation and calculation.

iris %>%
  rowwise() %>%
  mutate(count = sum(between(c_across(contains("Sepal")), 1, 5)))
  • Related