I have a data like this
df<-structure(list(R1 = c(512L, 44620L, 69500L, 91120L, 98870L),
R2 = c(587L, 38500L, 67370L, 94870L, 88120L), R3 = c(587L,
39370L, 57500L, 96870L, 85370L), R1.1 = c(737L, 2812L, 4050L,
6400L, 4762L), R2.1 = c(450L, 2587L, 3900L, 7287L, 5550L),
R3.1 = c(712L, 2175L, 4675L, 6687L, 4125L)), class = "data.frame", row.names = c(NA,
-5L))
I am trying to do the following
R1 R2 R3 R1.1 R2.1 R3.1
1 512 587 587 737 450 712
2 44620 38500 39370 2812 2587 2175
3 69500 67370 57500 4050 3900 4675
4 91120 94870 96870 6400 7287 6687
5 98870 88120 85370 4762 5550 4125
I can calculate it like the following
((mean(c(512,587,587))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(44620,38500,39370))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(69500,67370,57500))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(91120,94870,96870))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(98870,88120,85370))-(mean(c(512,587,587))))- 1822.9)/4167.5
((mean(c(2812,2587,2175))-(mean(c(737,450,712))))- 1822.9)/4167.5
((mean(c(4050,3900,4675))-(mean(c(737,450,712))))- 1822.9)/4167.5
((mean(c(6400,7287,6687))-(mean(c(737,450,712))))- 1822.9)/4167.5
((mean(c(4762,5550,4125))-(mean(c(737,450,712))))- 1822.9)/4167.5
However, I want to do it in an easier way as well as add SD to each calculation as well
Let take the mean the second row for R1,R2 and R3 column then minus the mean of the first row of those 3 column then minus a constant value - 1822.9 and divide by another constant value 4167.5. I do the same for the third , fourth and fifth row Then I do the same for the second set of column R1.1, R1.2 and R1.3
I can calculate the sd like this
(sd(c(512,587,587))-1822.9)/4167.5
(sd(c(44620,38500,39370))-1822.9)/4167.5
(sd(c(69500,67370,57500))-1822.9)/4167.5
(sd(c(91120,94870,96870))-1822.9)/4167.5
(sd(c(98870,88120,85370))-1822.9)/4167.5
(sd(c(2812,2587,2175))-1822.9)/4167.5
(sd(c(4050,3900,4675))-1822.9)/4167.5
(sd(c(4050,3900,4675))-1822.9)/4167.5
(sd(c(6400,7287,6687))-1822.9)/4167.5
(sd((c(4762,5550,4125)))-1822.9)/4167.5
So the output without SD is like this
data. SD
-0.4374085 -0.43
9.224979 0.35
14.97423 1.09
22.05201 0.26
21.21218 1.27
0.0165007 -0.36
0.4204999 -0.34
1.040296 -0.32
0.5654309 -0.26
CodePudding user response:
Here is an option using vectorised base R functions where possible (avoiding explicit loops)
do.call(rbind, lapply(
list(1:3, 4:6),
\(sel) data.frame(
data = (rowMeans(df[sel]) - rowMeans(df[sel])[1] - 1822.9) / 4167.5,
SD = (apply(df[sel], 1, sd) - 1822.9) / 4167.5)))
# data SD
#1 -0.4374085 -0.4270183
#2 9.2249790 0.3570573
#3 14.9742292 1.0988897
#4 22.0520136 0.2630226
#5 21.2121816 1.2744407
#6 -0.4374085 -0.3992622
#7 0.0165007 -0.3598939
#8 0.4204999 -0.3387773
#9 1.0402959 -0.3288037
#10 0.5654309 -0.2661231
The idea is to split the source data.frame
df
into two chunks (consisting of col1-3 and col4-6); then loop over those two chunks, calculate the quantities needed and row-bind them using do.call(rbind, ...)
. Base R doesn't have a rowSd()
function so we implement this through apply(..., 1, sd)
; you can also use matrixStats::rowSds()
if you don't mind an additional dependency (matrixStats::rowSds()
will be more performant than the apply(..., 1, sd)
solution).
Using matrixStats::rowSds()
do.call(rbind, lapply(
list(1:3, 4:6),
\(sel) data.frame(
data = (rowMeans(df[sel]) - rowMeans(df[sel])[1] - 1822.9) / 4167.5,
SD = (rowSds(as.matrix(df[sel])) - 1822.9) / 4167.5)))