I am trying to sum every n elements in a data frame for specific columns. For other columns, the nth element should be picked up.
For example, if n=5
and dataframe is data.frame(X=1:15,Y=1:15)
, then (let's say the first column is summed up in groups of 5 and second column is taken at nth value) result should be X=15,40,65 and Y=5,10,15.
The column numbers to sum are provided by a different variable (e.g., in above scenario it would contain 1). I've tried using 2 loops to handle this scenario, one for add and other for nth value, but I am then left with 2 different dataframes where order of columns is lost and code becomes inefficient. Is there a way to efficiently generate a new dataframe where column numbers present in a provided variable are added in group of n whereas others are taken at nth value? Appreciate any help on this.
CodePudding user response:
One way would be to specify the number of rows (and the nth) element using tidyverse
. So, here I use rep
to create groups for every 5 rows. Then, we can get the sum for a specific column (i.e., X
) by group, then use nth
to select the fifth element for the other column, Y
.
library(tidyverse)
df %>%
group_by(grp = rep(row_number(), length.out = n(), each = 5)) %>%
summarise(X = sum(X),Y = nth(Y, 5)) %>%
select(-grp)
Output
# A tibble: 3 × 2
X Y
<int> <int>
1 15 5
2 40 10
3 65 15
Data
df <- structure(list(X = 1:15, Y = 1:15),
class = "data.frame",
row.names = c(NA, -15L))
The nice thing about this approach is that we could adapt it for multiple columns for each function, as I suspect you might want to sum several columns and get the nth element of several columns. So, we can make use of across
to specify the columns that we want to return.
df2 %>%
group_by(grp = rep(row_number(), length.out = n(), each = 5)) %>%
summarise(across(c(X,Z), sum), across(c(Y,A), nth, 5)) %>%
select(-grp)
Output
# A tibble: 3 × 4
X Z Y A
<int> <int> <int> <int>
1 15 65 5 15
2 40 90 10 20
3 65 115 15 25
Data
df2 <- structure(list(X = 1:15, Y = 1:15, Z = 11:25, A = 11:25),
class = "data.frame", row.names = c(NA, -15L))
df2
X Y Z A
1 1 1 11 11
2 2 2 12 12
3 3 3 13 13
4 4 4 14 14
5 5 5 15 15
6 6 6 16 16
7 7 7 17 17
8 8 8 18 18
9 9 9 19 19
10 10 10 20 20
11 11 11 21 21
12 12 12 22 22
13 13 13 23 23
14 14 14 24 24
15 15 15 25 25
CodePudding user response:
library(dplyr)
# assign n = ?
n <- 5
# result
data.frame(X=1:15,Y=1:15) %>%
mutate(idx = rep(1:n(), each = n)[1:n()]) %>%
group_by(idx) %>%
summarise_all(sum)