Home > Back-end >  Sum every n elements for specific columns and take nth value of other columns [in R]
Sum every n elements for specific columns and take nth value of other columns [in R]

Time:01-03

I am trying to sum every n elements in a data frame for specific columns. For other columns, the nth element should be picked up.

For example, if n=5 and dataframe is data.frame(X=1:15,Y=1:15), then (let's say the first column is summed up in groups of 5 and second column is taken at nth value) result should be X=15,40,65 and Y=5,10,15.

The column numbers to sum are provided by a different variable (e.g., in above scenario it would contain 1). I've tried using 2 loops to handle this scenario, one for add and other for nth value, but I am then left with 2 different dataframes where order of columns is lost and code becomes inefficient. Is there a way to efficiently generate a new dataframe where column numbers present in a provided variable are added in group of n whereas others are taken at nth value? Appreciate any help on this.

CodePudding user response:

One way would be to specify the number of rows (and the nth) element using tidyverse. So, here I use rep to create groups for every 5 rows. Then, we can get the sum for a specific column (i.e., X) by group, then use nth to select the fifth element for the other column, Y.

library(tidyverse)

df %>% 
  group_by(grp = rep(row_number(), length.out = n(), each = 5)) %>% 
  summarise(X = sum(X),Y = nth(Y, 5)) %>% 
  select(-grp)

Output

# A tibble: 3 × 2
      X     Y
  <int> <int>
1    15     5
2    40    10
3    65    15

Data

df <- structure(list(X = 1:15, Y = 1:15), 
  class = "data.frame", 
  row.names = c(NA, -15L))

The nice thing about this approach is that we could adapt it for multiple columns for each function, as I suspect you might want to sum several columns and get the nth element of several columns. So, we can make use of across to specify the columns that we want to return.

df2 %>% 
  group_by(grp = rep(row_number(), length.out = n(), each = 5)) %>% 
  summarise(across(c(X,Z), sum), across(c(Y,A), nth, 5)) %>% 
  select(-grp)

Output

# A tibble: 3 × 4
      X     Z     Y     A
  <int> <int> <int> <int>
1    15    65     5    15
2    40    90    10    20
3    65   115    15    25

Data

df2 <- structure(list(X = 1:15, Y = 1:15, Z = 11:25, A = 11:25), 
                 class = "data.frame", row.names = c(NA, -15L))

df2
    X  Y  Z  A
1   1  1 11 11
2   2  2 12 12
3   3  3 13 13
4   4  4 14 14
5   5  5 15 15
6   6  6 16 16
7   7  7 17 17
8   8  8 18 18
9   9  9 19 19
10 10 10 20 20
11 11 11 21 21
12 12 12 22 22
13 13 13 23 23
14 14 14 24 24
15 15 15 25 25

CodePudding user response:


library(dplyr)

# assign n = ?
n <- 5

# result
data.frame(X=1:15,Y=1:15) %>%
  mutate(idx = rep(1:n(), each = n)[1:n()]) %>%
  group_by(idx) %>%
  summarise_all(sum)


  •  Tags:  
  • r
  • Related