I have my data frame as below.
df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33),
patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
row.names = c("gene1","gene2","gene3","gene4","gene5"))
> df
stat patient1 patient2 patient3
gene1 3.38 -0.44 0.400 0.350
gene2 -3.40 -0.22 0.045 0.210
gene3 4.45 0.80 -0.140 -0.230
gene4 -4.21 -0.21 -0.078 -0.019
gene5 3.33 -0.22 -0.160 -0.210
I have been struggling to find how to write a script or make a loop to calculate the sum of multiplication of the 'stat' column and each patient column as I have 141 columns and 142 rows in my patient dataset to do this job.
So, I would like to have a new row called "Signature Score" which has the calculated value by follows:
row.names(df)[nrow(df)] <- "Signature Score"
sum_multi_1 <- sum(df[c(1:nrow(df)-1),2]*df[c(1:nrow(df)-1),1])
sum_multi_2 <- sum(df[c(1:nrow(df)-1),3]*df[c(1:nrow(df)-1),1])
sum_multi_3 <- sum(df[c(1:nrow(df)-1),4]*df[c(1:nrow(df)-1),1])
df[nrow(df),2] <- sum_multi_1
df[nrow(df),3] <- sum_multi_2
df[nrow(df),4] <- sum_multi_3
which is...
> df
stat patient1 patient2 patient3
gene1 3.38 -0.4400 0.40000 0.35000
gene2 -3.40 -0.2200 0.04500 0.21000
gene3 4.45 0.8000 -0.14000 -0.23000
gene4 -4.21 -0.2100 -0.07800 -0.01900
gene5 3.33 -0.2200 -0.16000 -0.21000
Signature Score NA 2.9723 0.37158 -1.17381
I was trying to make a for loop something like this...
for (i in 1:nrow(df)){
df[nrow(df),i 1] <- sum(df[c(1:nrow(df)-1,i 1)]*df[c(1:nrow(df)-1),1])
}
but it doesn't do the job. Can anyone please tell me what I am missing or what I need to write?
All the best, Tj
CodePudding user response:
You can use mutate
and across
to calculate the required multiplication and then add the totals columns with adorn_totals()
from the janitor
package.
library(dplyr)
df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33),
patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
row.names = c("gene1","gene2","gene3","gene4","gene5")) %>%
rownames_to_column(var = "genes") %>%
mutate(across(patient1:patient3, ~.x * stat)) %>%
janitor::adorn_totals(name = "Signature Score")
df[length(df) 1, 2] <- NA
Output:
rowname stat patient1 patient2 patient3
gene1 3.38 -1.4872 1.35200 1.18300
gene2 -3.40 0.7480 -0.15300 -0.71400
gene3 4.45 3.5600 -0.62300 -1.02350
gene4 -4.21 0.8841 0.32838 0.07999
gene5 3.33 -0.7326 -0.53280 -0.69930
Signature Score NA 2.9723 0.37158 -1.17381
CodePudding user response:
Another possible solution, in base R:
rbind(df, signa = c(NA,colSums(df[,1] * df[-1])))
#> stat patient1 patient2 patient3
#> gene1 3.38 -0.4400 0.40000 0.35000
#> gene2 -3.40 -0.2200 0.04500 0.21000
#> gene3 4.45 0.8000 -0.14000 -0.23000
#> gene4 -4.21 -0.2100 -0.07800 -0.01900
#> gene5 3.33 -0.2200 -0.16000 -0.21000
#> signa NA 2.9723 0.37158 -1.17381
CodePudding user response:
I noticed that you subtracted 1
in order to let indices start at 0
. However, unlike in Python, in R indices start at 1. So could it be you want this:
colSums(df[-1]*df$stat)
# patient1 patient2 patient3
# 2.97230 0.37158 -1.17381
CodePudding user response:
You are complicating too much.
To make the code clearer define an auxiliary function fun
to multiply and sum the columns. Then apply
the function to the data.
df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33),
patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
row.names = c("gene1","gene2","gene3","gene4","gene5"))
# auxiliary function
fun <- function(x, y) sum(x * y)
apply(df[-1], 2, fun, y = df[[1]])
#> patient1 patient2 patient3
#> 2.97230 0.37158 -1.17381
sigscore <- apply(df[-1], 2, fun, y = df[[1]])
rbind(df, `Signature Score` = c(NA, sigscore))
#> stat patient1 patient2 patient3
#> gene1 3.38 -0.4400 0.40000 0.35000
#> gene2 -3.40 -0.2200 0.04500 0.21000
#> gene3 4.45 0.8000 -0.14000 -0.23000
#> gene4 -4.21 -0.2100 -0.07800 -0.01900
#> gene5 3.33 -0.2200 -0.16000 -0.21000
#> Signature Score NA 2.9723 0.37158 -1.17381
Created on 2022-05-05 by the reprex package (v2.0.1)