Home > front end >  How to calculate sum of multiplication of each column in a dataset by the first column?
How to calculate sum of multiplication of each column in a dataset by the first column?

Time:05-05

I have my data frame as below.

df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33), 
                 patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
                 patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
                 patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
                 row.names = c("gene1","gene2","gene3","gene4","gene5"))
> df
       stat patient1 patient2 patient3
gene1  3.38    -0.44    0.400    0.350
gene2 -3.40    -0.22    0.045    0.210
gene3  4.45     0.80   -0.140   -0.230
gene4 -4.21    -0.21   -0.078   -0.019
gene5  3.33    -0.22   -0.160   -0.210

I have been struggling to find how to write a script or make a loop to calculate the sum of multiplication of the 'stat' column and each patient column as I have 141 columns and 142 rows in my patient dataset to do this job.

So, I would like to have a new row called "Signature Score" which has the calculated value by follows:

row.names(df)[nrow(df)] <- "Signature Score"

sum_multi_1 <- sum(df[c(1:nrow(df)-1),2]*df[c(1:nrow(df)-1),1])
sum_multi_2 <- sum(df[c(1:nrow(df)-1),3]*df[c(1:nrow(df)-1),1])
sum_multi_3 <- sum(df[c(1:nrow(df)-1),4]*df[c(1:nrow(df)-1),1])

df[nrow(df),2] <- sum_multi_1
df[nrow(df),3] <- sum_multi_2
df[nrow(df),4] <- sum_multi_3

which is...

> df
                 stat patient1 patient2 patient3
gene1            3.38  -0.4400  0.40000  0.35000
gene2           -3.40  -0.2200  0.04500  0.21000
gene3            4.45   0.8000 -0.14000 -0.23000
gene4           -4.21  -0.2100 -0.07800 -0.01900
gene5            3.33  -0.2200 -0.16000 -0.21000
Signature Score    NA   2.9723  0.37158 -1.17381

I was trying to make a for loop something like this...

for (i in 1:nrow(df)){
  df[nrow(df),i 1] <- sum(df[c(1:nrow(df)-1,i 1)]*df[c(1:nrow(df)-1),1])
}

but it doesn't do the job. Can anyone please tell me what I am missing or what I need to write?

All the best, Tj

CodePudding user response:

You can use mutate and across to calculate the required multiplication and then add the totals columns with adorn_totals() from the janitor package.

   library(dplyr)
    df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33), 
                       patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
                       patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
                       patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
                       row.names = c("gene1","gene2","gene3","gene4","gene5")) %>% 
  rownames_to_column(var = "genes") %>% 
  mutate(across(patient1:patient3, ~.x * stat)) %>% 
  janitor::adorn_totals(name = "Signature Score") 
  
  df[length(df) 1, 2] <- NA

Output:

    rowname  stat patient1 patient2 patient3
           gene1  3.38  -1.4872  1.35200  1.18300
           gene2 -3.40   0.7480 -0.15300 -0.71400
           gene3  4.45   3.5600 -0.62300 -1.02350
           gene4 -4.21   0.8841  0.32838  0.07999
           gene5  3.33  -0.7326 -0.53280 -0.69930
 Signature Score    NA   2.9723  0.37158 -1.17381

CodePudding user response:

Another possible solution, in base R:

rbind(df, signa = c(NA,colSums(df[,1] * df[-1])))

#>        stat patient1 patient2 patient3
#> gene1  3.38  -0.4400  0.40000  0.35000
#> gene2 -3.40  -0.2200  0.04500  0.21000
#> gene3  4.45   0.8000 -0.14000 -0.23000
#> gene4 -4.21  -0.2100 -0.07800 -0.01900
#> gene5  3.33  -0.2200 -0.16000 -0.21000
#> signa    NA   2.9723  0.37158 -1.17381

CodePudding user response:

I noticed that you subtracted 1 in order to let indices start at 0. However, unlike in Python, in R indices start at 1. So could it be you want this:

colSums(df[-1]*df$stat)
# patient1 patient2 patient3 
#  2.97230  0.37158 -1.17381 

CodePudding user response:

You are complicating too much.
To make the code clearer define an auxiliary function fun to multiply and sum the columns. Then apply the function to the data.

df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33), 
                 patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
                 patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
                 patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
                 row.names = c("gene1","gene2","gene3","gene4","gene5"))

# auxiliary function
fun <- function(x, y) sum(x * y)

apply(df[-1], 2, fun, y = df[[1]])
#> patient1 patient2 patient3 
#>  2.97230  0.37158 -1.17381

sigscore <- apply(df[-1], 2, fun, y = df[[1]])
rbind(df, `Signature Score` = c(NA, sigscore))
#>                  stat patient1 patient2 patient3
#> gene1            3.38  -0.4400  0.40000  0.35000
#> gene2           -3.40  -0.2200  0.04500  0.21000
#> gene3            4.45   0.8000 -0.14000 -0.23000
#> gene4           -4.21  -0.2100 -0.07800 -0.01900
#> gene5            3.33  -0.2200 -0.16000 -0.21000
#> Signature Score    NA   2.9723  0.37158 -1.17381

Created on 2022-05-05 by the reprex package (v2.0.1)

  • Related