Home > database >  Sum every 3 columns of a dataframe to form new columns
Sum every 3 columns of a dataframe to form new columns

Time:10-29

I have a simple dataframe like the following:

ID, Type, a, b, c, d, e, f, etc.
ob1, 1,   1, 2, 3, 4, 5, 6, etc.
ob1, 2,   3, 4, 5, 6, 7, 1, etc.

I need to add the values of every 3 columns together, to produce new columns with the summed values. This would produce the following output:

ID, Type, sum1, sum2,  etc.
ob1, 1,     6,   15,   etc.
ob1, 2,    12,   14,   etc.

Using sequencing, I can do this manually for individual columns, but because I have many columns, how can I perform this summation automatically for every 3 columns (after a set starting point)?

CodePudding user response:

You can split your dataframe into pieces. Here, I am using n_start to set the starting column and exclude those columns before that from splitting (df1[,-(1:(n_start-1))]).

Then I loop through (map) the sequence of the columns to slice the dataframe for every 3 columns (crating a list of dataframes).

Using map_dfc I get the rowSums for each of these new sliced dataframes.

I set the names to sum1, sum2, etc. and lastly bind the excluded columns before the starting point to get the final result.

library(tidyverse)

n_split <- 3
n_start <- 3

seq(n_split, ncol(df1), n_split) %>%   
  map(~ select(df1[,-(1:(n_start-1))],(.-(n_split-1)):.)) %>%  
  map_dfc(rowSums) %>% 
  set_names(., nm = paste0("sum", seq(ncol(.)))) %>% 
  bind_cols(df1[,1:(n_start-1)], .)
#> # A tibble: 5 x 4
#    ID type sum1 sum2 sum3
# 1 ob1    1    4   12   21
# 2 ob2    2   25   42   51
# 3 ob3    3   46   72   81
# 4 ob4    4   67  102  111
# 5 ob5    5   88  132  141

Data:

df1 <- data.frame(ID = c("ob1", "ob2", "ob3", "ob4", "ob5"),
                  type = c(1, 2, 3, 4, 5),
                  a = c(1, 11, 21, 31, 41), 
                  b = c(2, 12, 22, 32, 42), 
                  c = c(3, 13, 23, 33, 43), 
                  d = c(4, 14, 24, 34, 44), 
                  e = c(5, 15, 25, 35, 45), 
                  f = c(6, 16, 26, 36, 46), 
                  g = c(7, 17, 27, 37, 47), 
                  h = c(8, 18, 28, 38, 48), 
                  i = c(9, 19, 29, 39, 49))

CodePudding user response:

You can do it just like that. Where df is your original data.frame you would use the mutate command:

library(dplyr)
    
df%>%
    mutate(sum1=(a b c),
           sum2=(d e f))%>%
    select(ID, Type, sum1, sum2, etc)

CodePudding user response:

In base R you can do something like this:

num_cols <- df[-c(1:2)]

cbind(df[1:2], do.call(cbind, 
                       lapply(setNames(seq(1,length(num_cols), 3), 
                                       paste0("sum", seq(length(num_cols)/3))), \(a) {
  apply(num_cols[a:(a   2)], 1, \(b) sum(as.numeric(gsub(",", "", b))))
  
})))

Because there are commas, I used gsub to remove them, setNames is used to give each column a dynamic name, apply is used within lapply to summarise each row

   ID. Type. sum1 sum2
1 ob1,    1,    6   15
2 ob1,    2,   12   14
  • Related