Home > Net >  Direct way to conditionally sum rows if condition is met in dplyr
Direct way to conditionally sum rows if condition is met in dplyr

Time:11-11

I have a problem that's simple: I want to get the sum across some columns (a and b in my example) of strings equal to -999.

df = tibble(a = c('-999', 'b', '-999'),
           b = c('-999', 'a', 'b'),
           c = 1:3)

from this:

  a     b         c
  <chr> <chr> <int>
1 -999  -999      1
2 b     a         2
3 -999  b         3

to this:

 a     b         c sum999
  <chr> <chr> <int>  <dbl>
1 -999  -999      1      2
2 b     a         2      0
3 -999  b         3      1

I managed to do it in a not so straightforward way:

df %>% 
  mutate(across(matches('^[ab]'), ~if_else(.x == '-999', 1, 0), 
                                        .names = '{.col}_' ) ) %>% 
      rowwise() %>% 
      mutate(sum999 = sum(c_across(matches('^[ab]_'))  ), 
                                        .keep = 'unused')

So, my question is, am I missing a better way to do this? Perhaps using rowSums ?

Thanks

CodePudding user response:

Two dplyr variants:

df %>%
  mutate(sum999 = rowSums(select(., a:b) == "-999"))
# # A tibble: 3 x 4
#   a     b         c sum999
#   <chr> <chr> <int>  <dbl>
# 1 -999  -999      1      2
# 2 b     a         2      0
# 3 -999  b         3      1

df %>%
  rowwise() %>%
  mutate(sum999 = sum(c_across(a:b) == "-999")) %>%
  ungroup()

(I prefer the first since it's a bit more performant.)

CodePudding user response:

The old way:

df$sum999 <- rowSums(df[, c("a", "b")] == "-999")

If you make a comparison between a value and a 2D object like a matrix or a data frame, you get a matrix with logical values:

         a     b
[1,]  TRUE  TRUE
[2,] FALSE FALSE
[3,]  TRUE FALSE

After that, you add the rows and get a vector of length nrow(df).

  • Related