I'm trying to find a way to do something which is probably quite simple. I want to get the average values and standard deviations of "A", "B" and "C" for each day in the following dataset:
M <- c("A", "A", "A", "A", "A", "B","B","B","B", "B", "C", "C", "C","C", "C" )
DCol <- c("19800101","19800102", "19800103", "19800104", "19800105","19800101","19800102", "19800103", "19800104", "19800105","19800101","19800102", "19800103", "19800104", "19800105")
V1 <- c(-6.8,-6.5,-6.05,-6.5,-5.2,-7.08,-5.7,-4.6,-4.6,-6.8,-6.5,-6.05,-6.5,-5.2, -7.06)
V2 <- c(-11.04,-11.1,-10.9,-10.6,-9.6,-11.6,-11.6,-9.7,-8.8,-11.1,-10.9,-10.6,-9.6,-11.6, -10.0)
V3 <- c(1.1,1.3,1.8,1.6,0.6,1.1,1.3,1.5,1.7,0.6,1.1,1.3,1.5,1.7, 1.1)
df <- data.frame(M, DCol, V1, V2, V3)
df Where M is a climate model, DCol is a series of dates, and V 1:V3 the results by model. So the data frame looks as follows:
M DCol V1 V2 V3
[1,] "A" "19800101" "-6.8" "-11.04" "1.1"
[2,] "A" "19800102" "-6.5" "-11.1" "1.3"
[3,] "A" "19800103" "-6.05" "-10.9" "1.8"
[4,] "A" "19800104" "-6.5" "-10.6" "1.6"
[5,] "A" "19800105" "-5.2" "-9.6" "0.6"
[6,] "B" "19800101" "-7.08" "-11.6" "1.1"
[7,] "B" "19800102" "-5.7" "-11.6" "1.3"
[8,] "B" "19800103" "-4.6" "-9.7" "1.5"
[9,] "B" "19800104" "-4.6" "-8.8" "1.7"
[10,] "B" "19800105" "-6.8" "-11.1" "0.6"
[11,] "C" "19800101" "-6.5" "-10.9" "1.1"
[12,] "C" "19800102" "-6.05" "-10.6" "1.3"
[13,] "C" "19800103" "-6.5" "-9.6" "1.5"
[14,] "C" "19800104" "-5.2" "-11.6" "1.7"
[15,] "C" "19800105" "-7.06" "-10" "1.1"
The resulting output in this instance would be a five row dataset with DCol, V1, V2, V3, and if possible standard deviations in adjacent columns.
Date period Model RCP Date meanTemp maxTemp minTemp precipitation windSpeed rad humidity
101908 2 HadGEM2-ES 26 19800101 -6.60 -5.9 -7.3 0.04 0.8217593 8.101852 100.0
101909 2 HadGEM2-ES 26 19800102 -6.20 -5.0 -7.4 0.08 2.2453704 9.259259 100.0
101910 2 HadGEM2-ES 26 19800103 -5.70 -5.0 -6.4 0.28 1.9444444 8.101852 94.7
101911 2 HadGEM2-ES 26 19800104 -5.70 -5.0 -6.4 0.08 1.0416667 8.101852 97.5
101912 2 HadGEM2-ES 26 19800105 -6.20 -5.0 -7.4 0.00 1.1226852 9.259259 98.5
A sample of the whole dataset, I have to say I don't understand the aggregate function well enough to know why it isn't working here. Thanks in advance.
CodePudding user response:
You can use aggregate()
from base R i.e.,
aggregate(.~DCol, df[-1], FUN = function(x) c(avg = mean(x), sd = sd(x)))
DCol V1.avg V1.sd V2.avg V2.sd V3.avg V3.sd
1 19800101 -6.7933333 0.2900575 -11.1800000 0.3704052 1.10000000 0.00000000
2 19800102 -6.0833333 0.4010403 -11.1000000 0.5000000 1.30000000 0.00000000
3 19800103 -5.7166667 0.9928914 -10.0666667 0.7234178 1.60000000 0.17320508
4 19800104 -5.4333333 0.9712535 -10.3333333 1.4189198 1.66666667 0.05773503
5 19800105 -6.3533333 1.0072405 -10.2333333 0.7767453 0.76666667 0.28867513
CodePudding user response:
Update after clarification:
df %>%
as_tibble() %>%
type.convert(as.is = TRUE) %>%
dplyr::group_by(DCol) %>%
summarise(across(c(V1, V2, V3), list(mean = mean, sd = sd), .names = "{col}_{fn}"))
DCol V1_mean V1_sd V2_mean V2_sd V3_mean V3_sd
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 19800101 -6.79 0.290 -11.2 0.370 1.1 0
2 19800102 -6.08 0.401 -11.1 0.5 1.3 0
3 19800103 -5.72 0.993 -10.1 0.723 1.6 0.173
4 19800104 -5.43 0.971 -10.3 1.42 1.67 0.0577
5 19800105 -6.35 1.01 -10.2 0.777 0.767 0.289
First answer:
We could do something like this:
First transform to tibble (df is matrix) with as_tibble
,
then transform V1:V3 to numeric with type.convert(as.is=TRUE)
, and finally use mutate with across:
library(dplyr)
library(tibble)
df %>%
as_tibble() %>%
type.convert(as.is = TRUE) %>%
group_by(M) %>%
mutate(across(c(V1, V2, V3), list(mean = mean, sd = sd), .names = "{col}_{fn}"))
M DCol V1 V2 V3 V1_mean V1_sd V2_mean V2_sd V3_mean V3_sd
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 19800101 -6.8 -11.0 1.1 -6.21 0.625 -10.6 0.617 1.28 0.466
2 A 19800102 -6.5 -11.1 1.3 -6.21 0.625 -10.6 0.617 1.28 0.466
3 A 19800103 -6.05 -10.9 1.8 -6.21 0.625 -10.6 0.617 1.28 0.466
4 A 19800104 -6.5 -10.6 1.6 -6.21 0.625 -10.6 0.617 1.28 0.466
5 A 19800105 -5.2 -9.6 0.6 -6.21 0.625 -10.6 0.617 1.28 0.466
6 B 19800101 -7.08 -11.6 1.1 -5.76 1.17 -10.6 1.25 1.24 0.422
7 B 19800102 -5.7 -11.6 1.3 -5.76 1.17 -10.6 1.25 1.24 0.422
8 B 19800103 -4.6 -9.7 1.5 -5.76 1.17 -10.6 1.25 1.24 0.422
9 B 19800104 -4.6 -8.8 1.7 -5.76 1.17 -10.6 1.25 1.24 0.422
10 B 19800105 -6.8 -11.1 0.6 -5.76 1.17 -10.6 1.25 1.24 0.422
11 C 19800101 -6.5 -10.9 1.1 -6.26 0.693 -10.5 0.780 1.34 0.261
12 C 19800102 -6.05 -10.6 1.3 -6.26 0.693 -10.5 0.780 1.34 0.261
13 C 19800103 -6.5 -9.6 1.5 -6.26 0.693 -10.5 0.780 1.34 0.261
14 C 19800104 -5.2 -11.6 1.7 -6.26 0.693 -10.5 0.780 1.34 0.261
15 C 19800105 -7.06 -10 1.1 -6.26 0.693 -10.5 0.780 1.34 0.261
CodePudding user response:
The slightly-tweaked tidyverse format would be:
library(tidyverse)
df %>%
group_by(DCol) %>%
summarise(across(c(V1, V2, V3), list(mean = mean, sd = sd), .names = "{col}_{fn}"))
#> # A tibble: 5 × 7
#> DCol V1_mean V1_sd V2_mean V2_sd V3_mean V3_sd
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 19800101 -6.79 0.290 -11.2 0.370 1.1 0
#> 2 19800102 -6.08 0.401 -11.1 0.5 1.3 0
#> 3 19800103 -5.72 0.993 -10.1 0.723 1.6 0.173
#> 4 19800104 -5.43 0.971 -10.3 1.42 1.67 0.0577
#> 5 19800105 -6.35 1.01 -10.2 0.777 0.767 0.289