I have the following data:
library(tidyverse)
df <- data.frame(id = c(1,1,1,2,2,2),
x = rep(letters[1:2], each = 3),
y = c(3,4,3,5,6,5),
z = c(7,8,9,10,11,12))
I now want to summarize the data by id
in a way where I get the sum of z
depending on y
values. The y
condition itself depends on the value of x
.
I thought I could use the code below, but this gives me all input ids and doesn‘t summarize. The result is correct, but I still want to have one row per id.
df %>%
group_by(id) %>%
summarize(test = case_when(x == 'a' ~ sum(z[y == 3]),
x == 'b' ~ sum(z[y == 5])))
# A tibble: 6 x 2
# Groups: id [2]
id test
<dbl> <dbl>
1 1 16
2 1 16
3 1 16
4 2 22
5 2 22
6 2 22
The following works, but I don‘t understand why it does and the above code does not.
df %>%
group_by(id) %>%
summarize(test = case_when(all(x == 'a') ~ sum(z[y == 3]),
all(x == 'b') ~ sum(z[y == 5])))
# A tibble: 2 x 2
id test
<dbl> <dbl>
1 1 16
2 2 22
Also, is there a more straigthforward way to do my summarization?
CodePudding user response:
Because, case_when
similar to ifelse(test, x, y)
will return a vector of the same length as test
. all(x == z)
has length 1 and so the returned valued is of length 1.