How to calculate sum of multiplication of each column in a dataset by the first column?-CodePudding

I have my data frame as below.

df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33), 
                 patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
                 patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
                 patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
                 row.names = c("gene1","gene2","gene3","gene4","gene5"))

> df
       stat patient1 patient2 patient3
gene1  3.38    -0.44    0.400    0.350
gene2 -3.40    -0.22    0.045    0.210
gene3  4.45     0.80   -0.140   -0.230
gene4 -4.21    -0.21   -0.078   -0.019
gene5  3.33    -0.22   -0.160   -0.210

I have been struggling to find how to write a script or make a loop to calculate the sum of multiplication of the 'stat' column and each patient column as I have 141 columns and 142 rows in my patient dataset to do this job.

So, I would like to have a new row called "Signature Score" which has the calculated value by follows:

row.names(df)[nrow(df)] <- "Signature Score"

sum_multi_1 <- sum(df[c(1:nrow(df)-1),2]*df[c(1:nrow(df)-1),1])
sum_multi_2 <- sum(df[c(1:nrow(df)-1),3]*df[c(1:nrow(df)-1),1])
sum_multi_3 <- sum(df[c(1:nrow(df)-1),4]*df[c(1:nrow(df)-1),1])

df[nrow(df),2] <- sum_multi_1
df[nrow(df),3] <- sum_multi_2
df[nrow(df),4] <- sum_multi_3

which is...

> df
                 stat patient1 patient2 patient3
gene1            3.38  -0.4400  0.40000  0.35000
gene2           -3.40  -0.2200  0.04500  0.21000
gene3            4.45   0.8000 -0.14000 -0.23000
gene4           -4.21  -0.2100 -0.07800 -0.01900
gene5            3.33  -0.2200 -0.16000 -0.21000
Signature Score    NA   2.9723  0.37158 -1.17381

I was trying to make a for loop something like this...

for (i in 1:nrow(df)){
  df[nrow(df),i 1] <- sum(df[c(1:nrow(df)-1,i 1)]*df[c(1:nrow(df)-1),1])
}

but it doesn't do the job. Can anyone please tell me what I am missing or what I need to write?

All the best, Tj

CodePudding user response：

You can use mutate and across to calculate the required multiplication and then add the totals columns with adorn_totals() from the janitor package.

   library(dplyr)
    df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33), 
                       patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
                       patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
                       patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
                       row.names = c("gene1","gene2","gene3","gene4","gene5")) %>% 
  rownames_to_column(var = "genes") %>% 
  mutate(across(patient1:patient3, ~.x * stat)) %>% 
  janitor::adorn_totals(name = "Signature Score") 
  
  df[length(df) 1, 2] <- NA

Output:

    rowname  stat patient1 patient2 patient3
           gene1  3.38  -1.4872  1.35200  1.18300
           gene2 -3.40   0.7480 -0.15300 -0.71400
           gene3  4.45   3.5600 -0.62300 -1.02350
           gene4 -4.21   0.8841  0.32838  0.07999
           gene5  3.33  -0.7326 -0.53280 -0.69930
 Signature Score    NA   2.9723  0.37158 -1.17381

CodePudding user response：

Another possible solution, in base R:

rbind(df, signa = c(NA,colSums(df[,1] * df[-1])))

#>        stat patient1 patient2 patient3
#> gene1  3.38  -0.4400  0.40000  0.35000
#> gene2 -3.40  -0.2200  0.04500  0.21000
#> gene3  4.45   0.8000 -0.14000 -0.23000
#> gene4 -4.21  -0.2100 -0.07800 -0.01900
#> gene5  3.33  -0.2200 -0.16000 -0.21000
#> signa    NA   2.9723  0.37158 -1.17381

CodePudding user response：

I noticed that you subtracted 1 in order to let indices start at 0. However, unlike in Python, in R indices start at 1. So could it be you want this:

colSums(df[-1]*df$stat)
# patient1 patient2 patient3 
#  2.97230  0.37158 -1.17381

CodePudding user response：

You are complicating too much.
To make the code clearer define an auxiliary function fun to multiply and sum the columns. Then apply the function to the data.

df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33), 
                 patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
                 patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
                 patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
                 row.names = c("gene1","gene2","gene3","gene4","gene5"))

# auxiliary function
fun <- function(x, y) sum(x * y)

apply(df[-1], 2, fun, y = df[[1]])
#> patient1 patient2 patient3 
#>  2.97230  0.37158 -1.17381

sigscore <- apply(df[-1], 2, fun, y = df[[1]])
rbind(df, `Signature Score` = c(NA, sigscore))
#>                  stat patient1 patient2 patient3
#> gene1            3.38  -0.4400  0.40000  0.35000
#> gene2           -3.40  -0.2200  0.04500  0.21000
#> gene3            4.45   0.8000 -0.14000 -0.23000
#> gene4           -4.21  -0.2100 -0.07800 -0.01900
#> gene5            3.33  -0.2200 -0.16000 -0.21000
#> Signature Score    NA   2.9723  0.37158 -1.17381

^{Created on 2022-05-05 by the reprex package (v2.0.1)}