My example
df <- data.frame(id1 = c("a" , "b", "c"),
id2 = c("a", "a", "d"),
n1 = c(2,2,0),
n2 = c(2,1,1),
n3 = c(0,1,1),
n4 = c(0,1,1))
First, I already aggregated all rows across column like this
df <- df %>%
group_by(id2) %>%
summarise(across(c(n1,n2,n3,n4), sum, na.rm = TRUE),
.groups = "drop")
Now, but now I would like to aggregate only 2 first rows having a
in column id2
. How we keep the column id1
since my desire output like this. Honestly, that column is just used to compare to id2
and is quite redundant, but I really want to keep it.
id1 id2 n1 n2 n3 n4
a a 4 3 1 1
c d 0 1 1 1
Any suggestions for this?
CodePudding user response:
Change the id2
values where it has 'a'
in it.
library(dplyr)
df %>%
group_by(id1 = ifelse(id2 == 'a', id2, id1), id2) %>%
summarise(across(starts_with('n'), sum, na.rm = TRUE), .groups = "drop")
# id1 id2 n1 n2 n3 n4
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 a a 4 3 1 1
#2 c d 0 1 1 1
CodePudding user response:
Other solution would be using case_when
. This function is more readable if you need to use multiple casuistic sentences:
library(dplyr)
df %>%
mutate(id1 = case_when(
id2 == 'a' ~ id2,
TRUE ~ id1
)) %>%
group_by(id1, id2) %>%
summarise(across(starts_with('n'), sum, na.rm = TRUE),
.groups = "drop")
which yields:
## A tibble: 2 x 6
# id1 id2 n1 n2 n3 n4
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 a a 4 3 1 1
#2 c d 0 1 1 1
Note: The
summarise
part was copied from @Ronak Shah's answer