I have a simple dataframe like the following:
ID, Type, a, b, c, d, e, f, etc.
ob1, 1, 1, 2, 3, 4, 5, 6, etc.
ob1, 2, 3, 4, 5, 6, 7, 1, etc.
I need to add the values of every 3 columns together, to produce new columns with the summed values. This would produce the following output:
ID, Type, sum1, sum2, etc.
ob1, 1, 6, 15, etc.
ob1, 2, 12, 14, etc.
Using sequencing, I can do this manually for individual columns, but because I have many columns, how can I perform this summation automatically for every 3 columns (after a set starting point)?
CodePudding user response:
You can split your dataframe into pieces. Here, I am using n_start
to set the starting column and exclude those columns before that from splitting (df1[,-(1:(n_start-1))]
).
Then I loop through (map
) the sequence of the columns to slice the dataframe for every 3 columns (crating a list of dataframes).
Using map_dfc
I get the rowSums
for each of these new sliced dataframes.
I set the names to sum1, sum2, etc. and lastly bind the excluded columns before the starting point to get the final result.
library(tidyverse)
n_split <- 3
n_start <- 3
seq(n_split, ncol(df1), n_split) %>%
map(~ select(df1[,-(1:(n_start-1))],(.-(n_split-1)):.)) %>%
map_dfc(rowSums) %>%
set_names(., nm = paste0("sum", seq(ncol(.)))) %>%
bind_cols(df1[,1:(n_start-1)], .)
#> # A tibble: 5 x 4
# ID type sum1 sum2 sum3
# 1 ob1 1 4 12 21
# 2 ob2 2 25 42 51
# 3 ob3 3 46 72 81
# 4 ob4 4 67 102 111
# 5 ob5 5 88 132 141
Data:
df1 <- data.frame(ID = c("ob1", "ob2", "ob3", "ob4", "ob5"),
type = c(1, 2, 3, 4, 5),
a = c(1, 11, 21, 31, 41),
b = c(2, 12, 22, 32, 42),
c = c(3, 13, 23, 33, 43),
d = c(4, 14, 24, 34, 44),
e = c(5, 15, 25, 35, 45),
f = c(6, 16, 26, 36, 46),
g = c(7, 17, 27, 37, 47),
h = c(8, 18, 28, 38, 48),
i = c(9, 19, 29, 39, 49))
CodePudding user response:
You can do it just like that. Where df
is your original data.frame you would use the mutate
command:
library(dplyr)
df%>%
mutate(sum1=(a b c),
sum2=(d e f))%>%
select(ID, Type, sum1, sum2, etc)
CodePudding user response:
In base R you can do something like this:
num_cols <- df[-c(1:2)]
cbind(df[1:2], do.call(cbind,
lapply(setNames(seq(1,length(num_cols), 3),
paste0("sum", seq(length(num_cols)/3))), \(a) {
apply(num_cols[a:(a 2)], 1, \(b) sum(as.numeric(gsub(",", "", b))))
})))
Because there are commas, I used gsub
to remove them,
setNames
is used to give each column a dynamic name,
apply
is used within lapply
to summarise each row
ID. Type. sum1 sum2
1 ob1, 1, 6 15
2 ob1, 2, 12 14