I have a flexible vector of combinations (in real life it could vary a lot and depends on an external table, so I could not slicing or using across, or something depending on the name itself of variables in my df).
I would like to group/sum up the variables in my df, whose name matches the names in "possible comb" vector. Then applying a "_sum" suffix to output variable names, e.g. Jon.A_sum.
In my df, I have several variables, not all should be summed up, but only a selected and flexable list matching with "possible comb" names.
In this code I miss how to rename the output variables with _sum suffix in the lapply step, if possible, but I'm open to other approaches of looping.
possible_comb <- c("Jon.A", "Bill.C", "Maria.E", "Ben.D")
Jon.A <- c(23, 41, 32, 58, 26)
Bill.C <- c(13, 41, 35, 18, 66)
v3 <- c(3,34, 33, 34, 23)
weight <- c(2, 2, 3,3, 6)
df <- data.frame(Jon.A,Bill.C,v3,weight)
setDT(df)
df_grouped<- df[, lapply(.SD, sum), by=c("weight") , .SDcols=possible_comb]
#wanted results
Jon.A_sum <- c(64, 90, 26)
Bill.C_sum <- c(54,53, 66)
weight <- c(2,3, 6)
wanted <- data.frame(Jon.A_sum,Bill.C_sum,weight)
CodePudding user response:
data.table
solution -
library(data.table)
possible_comb <- c("Jon.A", "Bill.C")
new_cols <- paste0(possible_comb, '_sum')
df_grouped<- df[, setNames(lapply(.SD, sum), new_cols),
by=c("weight") , .SDcols=possible_comb]
df_grouped
# weight Jon.A_sum Bill.C_sum
#1: 2 64 54
#2: 3 90 53
#3: 6 26 66
In dplyr
you can use across
with group_by
and assign new names with .names
.
library(dplyr)
df %>%
group_by(weight) %>%
summarise(across(all_of(possible_comb), sum, .names = '{col}_sum'))
# weight Jon.A_sum Bill.C_sum
# <dbl> <dbl> <dbl>
#1 2 64 54
#2 3 90 53
#3 6 26 66
CodePudding user response:
If I understand your desired output correctly you can do something like this:
cols_to_use <- possible_comb[names(df) %in% possible_comb]
df_grouped<- df[, lapply(.SD, sum), by=c("weight") , .SDcols = cols_to_use]
setcolorder(df_grouped, cols_to_use)
setnames(df_grouped, old = cols_to_use, new = paste(cols_to_use, "sum", sep = "_"))