Home > Blockchain >  how can I divide all raws by first raw and calculate standard deviation
how can I divide all raws by first raw and calculate standard deviation

Time:09-30

I have a data like this

df<-structure(list(R1 = c(512L, 44620L, 69500L, 91120L, 98870L), 
    R2 = c(587L, 38500L, 67370L, 94870L, 88120L), R3 = c(587L, 
    39370L, 57500L, 96870L, 85370L), R1.1 = c(737L, 2812L, 4050L, 
    6400L, 4762L), R2.1 = c(450L, 2587L, 3900L, 7287L, 5550L), 
    R3.1 = c(712L, 2175L, 4675L, 6687L, 4125L)), class = "data.frame", row.names = c(NA, 
-5L))

I am trying to do the following

    R1    R2    R3 R1.1 R2.1 R3.1
1   512   587   587  737  450  712
2 44620 38500 39370 2812 2587 2175
3 69500 67370 57500 4050 3900 4675
4 91120 94870 96870 6400 7287 6687
5 98870 88120 85370 4762 5550 4125

I can calculate it like the following

((mean(c(512,587,587))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(44620,38500,39370))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(69500,67370,57500))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(91120,94870,96870))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(98870,88120,85370))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(2812,2587,2175))-(mean(c(737,450,712))))- 1822.9)/4167.5
((mean(c(4050,3900,4675))-(mean(c(737,450,712))))- 1822.9)/4167.5
((mean(c(6400,7287,6687))-(mean(c(737,450,712))))- 1822.9)/4167.5
((mean(c(4762,5550,4125))-(mean(c(737,450,712))))- 1822.9)/4167.5

However, I want to do it in an easier way as well as add SD to each calculation as well

Let take the mean the second row for R1,R2 and R3 column then minus the mean of the first row of those 3 column then minus a constant value - 1822.9 and divide by another constant value 4167.5. I do the same for the third , fourth and fifth row Then I do the same for the second set of column R1.1, R1.2 and R1.3

I can calculate the sd like this

(sd(c(512,587,587))-1822.9)/4167.5
(sd(c(44620,38500,39370))-1822.9)/4167.5
(sd(c(69500,67370,57500))-1822.9)/4167.5
(sd(c(91120,94870,96870))-1822.9)/4167.5
(sd(c(98870,88120,85370))-1822.9)/4167.5
(sd(c(2812,2587,2175))-1822.9)/4167.5
(sd(c(4050,3900,4675))-1822.9)/4167.5
(sd(c(4050,3900,4675))-1822.9)/4167.5
(sd(c(6400,7287,6687))-1822.9)/4167.5
(sd((c(4762,5550,4125)))-1822.9)/4167.5 

So the output without SD is like this

 data.       SD
-0.4374085  -0.43
9.224979    0.35
14.97423    1.09
22.05201    0.26
21.21218    1.27
0.0165007   -0.36
0.4204999   -0.34
1.040296    -0.32
0.5654309   -0.26

CodePudding user response:

Here is an option using vectorised base R functions where possible (avoiding explicit loops)

do.call(rbind, lapply(
    list(1:3, 4:6), 
    \(sel) data.frame(
        data = (rowMeans(df[sel]) - rowMeans(df[sel])[1] - 1822.9) / 4167.5,
        SD = (apply(df[sel], 1, sd) - 1822.9) / 4167.5)))
#         data         SD
#1  -0.4374085 -0.4270183
#2   9.2249790  0.3570573
#3  14.9742292  1.0988897
#4  22.0520136  0.2630226
#5  21.2121816  1.2744407
#6  -0.4374085 -0.3992622
#7   0.0165007 -0.3598939
#8   0.4204999 -0.3387773
#9   1.0402959 -0.3288037
#10  0.5654309 -0.2661231

The idea is to split the source data.frame df into two chunks (consisting of col1-3 and col4-6); then loop over those two chunks, calculate the quantities needed and row-bind them using do.call(rbind, ...). Base R doesn't have a rowSd() function so we implement this through apply(..., 1, sd); you can also use matrixStats::rowSds() if you don't mind an additional dependency (matrixStats::rowSds() will be more performant than the apply(..., 1, sd) solution).


Using matrixStats::rowSds()

do.call(rbind, lapply(
    list(1:3, 4:6), 
    \(sel) data.frame(
        data = (rowMeans(df[sel]) - rowMeans(df[sel])[1] - 1822.9) / 4167.5,
        SD = (rowSds(as.matrix(df[sel])) - 1822.9) / 4167.5)))
  •  Tags:  
  • r
  • Related