I have a problem that's simple: I want to get the sum across some columns (a
and b
in my example) of strings equal to -999
.
df = tibble(a = c('-999', 'b', '-999'),
b = c('-999', 'a', 'b'),
c = 1:3)
from this:
a b c
<chr> <chr> <int>
1 -999 -999 1
2 b a 2
3 -999 b 3
to this:
a b c sum999
<chr> <chr> <int> <dbl>
1 -999 -999 1 2
2 b a 2 0
3 -999 b 3 1
I managed to do it in a not so straightforward way:
df %>%
mutate(across(matches('^[ab]'), ~if_else(.x == '-999', 1, 0),
.names = '{.col}_' ) ) %>%
rowwise() %>%
mutate(sum999 = sum(c_across(matches('^[ab]_')) ),
.keep = 'unused')
So, my question is, am I missing a better way to do this? Perhaps using rowSums
?
Thanks
CodePudding user response:
Two dplyr variants:
df %>%
mutate(sum999 = rowSums(select(., a:b) == "-999"))
# # A tibble: 3 x 4
# a b c sum999
# <chr> <chr> <int> <dbl>
# 1 -999 -999 1 2
# 2 b a 2 0
# 3 -999 b 3 1
df %>%
rowwise() %>%
mutate(sum999 = sum(c_across(a:b) == "-999")) %>%
ungroup()
(I prefer the first since it's a bit more performant.)
CodePudding user response:
The old way:
df$sum999 <- rowSums(df[, c("a", "b")] == "-999")
If you make a comparison between a value and a 2D object like a matrix or a data frame, you get a matrix with logical values:
a b
[1,] TRUE TRUE
[2,] FALSE FALSE
[3,] TRUE FALSE
After that, you add the rows and get a vector of length nrow(df)
.