I have a dataframe with gene expression data by lane (column). What I would like to do is write a loop that takes the sum of each row but progressively adds in another column each time. So each time I loop through I add another column to my dataframe that contains the sums of each row plus another column to the end of the dataframe. In the example below I did this using the apply() function by hand but this is very inefficient and not feasible for a large data set. I messed around with the cumsum() function but couldn't seem to get it to work for this. Very possible I missed something obvious but any guidance would be great!
#Example dataframe
c1 <- c('G1', 'G2', 'G3')
c2 <- c(5, 3, 1)
c3 <- c(3, 7, 1)
c4 <- c(6, 3, 4)
c5 <- c(6, 4, 3)
df <- data.frame(c1, c2, c3, c4, c5)
#Cal cumulative sums
sum.2.3 <- apply(df[,2:3],1,sum)
sum.2.4 <- apply(df[,2:4],1,sum)
sum.2.5 <- apply(df[,2:5],1,sum)
df <- cbind(df, sum.2.3, sum.2.4, sum.2.5)
CodePudding user response:
If the problem is the loop, you use apply inside it.
Code
start_col <- 2
end_col <- ncol(df)
for(i in (start_col 1):end_col){
var_name <- paste("sum",start_col,i,sep = ".")
df[,var_name] <- apply(df[,start_col:i],1,sum)
}
Output
c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
1 G1 5 3 6 6 8 14 20
2 G2 3 7 3 4 10 13 17
3 G3 1 1 4 3 2 6 9
CodePudding user response:
You can use Reduce()
Reduce(` `, df[-1], accumulate = TRUE)[-1]
[[1]]
[1] 8 10 2
[[2]]
[1] 14 13 6
[[3]]
[1] 20 17 9
Assign into the data frame:
df[paste0("sum.2.", 3:5)] <- Reduce(` `, df[-1], accumulate = TRUE)[-1]
Gives:
c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
1 G1 5 3 6 6 8 14 20
2 G2 3 7 3 4 10 13 17
3 G3 1 1 4 3 2 6 9
CodePudding user response:
No loop needed.
df <- data.frame(
c1 = c('G1', 'G2', 'G3'),
c2 = c(5, 3, 1),
c3 = c(3, 7, 1),
c4 = c(6, 3, 4),
c5 = c(6, 4, 3))
cbind(df, setNames(as.data.frame(t(apply(df[,-1], 1, cumsum))[,-1]), paste0("sum.2.", 3:5)))
#> c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
#> 1 G1 5 3 6 6 8 14 20
#> 2 G2 3 7 3 4 10 13 17
#> 3 G3 1 1 4 3 2 6 9
CodePudding user response:
Using rowCumsums
from matrixStats
library(matrixStats)
df[paste0("sum.2.", 3:5)] <- rowCumsums(as.matrix(df[2:5]))[,-1]
-output
> df
c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
1 G1 5 3 6 6 8 14 20
2 G2 3 7 3 4 10 13 17
3 G3 1 1 4 3 2 6 9
CodePudding user response:
You can use both the mutate
function from the dplyr
package and the rowSums
base function.
library(dplyr)
c1 <- c('G1', 'G2', 'G3')
c2 <- c(5, 3, 1)
c3 <- c(3, 7, 1)
c4 <- c(6, 3, 4)
c5 <- c(6, 4, 3)
df <- data.frame(c1, c2, c3, c4, c5)
df <- df %>%
dplyr::mutate(sum.2.3 = rowSums(across(c2:c3)),
sum.2.4 = rowSums(across(c2:c4)),
sum.2.5 = rowSums(across(c2:c5)))
Result
c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
1 G1 5 3 6 6 8 14 20
2 G2 3 7 3 4 10 13 17
3 G3 1 1 4 3 2 6 9