Considering this is my dataset
df <-
structure(list(id = 1:6, w = c(NA, 11L, 12L, 13L, 14L, 15L),
x = c(20L, 21L, NA, 23L, 24L, 25L), y = c(30L, 31L, 32L,
NA, 34L, 35L), z = c(40L, NA, 42L, 43L, 44L, NA), Group = c("Group1",
"Group2", "Group2", "Group2", "Group3", "Group3")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
id w x y z Group
1 NA 20 30 40 Group1
2 11 21 31 NA Group2
3 12 NA 32 42 Group2
4 13 23 NA 43 Group2
5 14 24 34 44 Group3
6 15 25 35 NA Group3
I am trying to sum observations from columns w, x, y,z row wise by Groups. I am using case_when
statement like this.
df <- df %>%
mutate(
group1_total <- case_when( Group = "Group1" ~ rowSums(w,x, na.rm = TRUE)),
group2_total <- case_when( Group = "Group2" ~ rowSums(w,x,z, na.rm = TRUE)),
group3_total <- case_when( Group = "Group3" ~ rowSums(w,x,y,z, na.rm = TRUE))
)
I am getting an error saying, error in rowSums()
. Not sure what I am doing wrong. Any suggestions or help is much appreciated, Thanks.
CodePudding user response:
I would create a named list where we specify the variables to be summed for each group.
cur_data()
returns the data for the current group, group_cols[[unique(Group)]]
selects the appropriate columns for each group.
library(dplyr)
group_cols <- list('Group1' = c('w', 'x'), 'Group2' = c('w', 'x', 'z'),
'Group3' = c('w', 'x', 'y', 'z'))
df %>%
group_by(Group) %>%
mutate(total = rowSums(select(cur_data(),
group_cols[[unique(Group)]]), na.rm = TRUE)) %>%
ungroup
# id w x y z Group total
# <int> <int> <int> <int> <int> <chr> <dbl>
#1 1 NA 20 30 40 Group1 20
#2 2 11 21 31 NA Group2 32
#3 3 12 NA 32 42 Group2 54
#4 4 13 23 NA 43 Group2 79
#5 5 14 24 34 44 Group3 116
#6 6 15 25 35 NA Group3 75
The benefit of this is the calculation is done per group instead of per row.
CodePudding user response:
You could the combination of rowwise()
, c_across()
(c_across()
is designed to work with rowwise()
to make it easy to perform row-wise aggregations) and sum()
, e.g.
library(dplyr)
df %>%
rowwise() %>%
mutate(
group1_total = case_when( Group == "Group1" ~ sum(c_across(w:x), na.rm = TRUE)),
group2_total = case_when( Group == "Group2" ~ sum(c_across(c(w,x,z)), na.rm = TRUE)),
group3_total = case_when( Group == "Group3" ~ sum(c_across(w:z), na.rm = TRUE))
)
output:
id w x y z Group group1_total group2_total group3_total
<int> <int> <int> <int> <int> <chr> <int> <int> <int>
1 1 NA 20 30 40 Group1 20 NA NA
2 2 11 21 31 NA Group2 NA 32 NA
3 3 12 NA 32 42 Group2 NA 54 NA
4 4 13 23 NA 43 Group2 NA 79 NA
5 5 14 24 34 44 Group3 NA NA 116
6 6 15 25 35 NA Group3 NA NA 75
If you are interested in one total column you could then use coalesce()
, eg.
df %>%
rowwise() %>%
mutate(
group1_total = case_when( Group == "Group1" ~ sum(c_across(w:x), na.rm = TRUE)),
group2_total = case_when( Group == "Group2" ~ sum(c_across(c(w,x,z)), na.rm = TRUE)),
group3_total = case_when( Group == "Group3" ~ sum(c_across(w:z), na.rm = TRUE))
) %>%
mutate(total = coalesce(group1_total, group2_total,group3_total)) %>%
select(-contains("_total"))
Resulting in
# A tibble: 6 x 7
# Rowwise:
id w x y z Group total
<int> <int> <int> <int> <int> <chr> <int>
1 1 NA 20 30 40 Group1 20
2 2 11 21 31 NA Group2 32
3 3 12 NA 32 42 Group2 54
4 4 13 23 NA 43 Group2 79
5 5 14 24 34 44 Group3 116
6 6 15 25 35 NA Group3 75