Home > Blockchain >  How to get means and standard deviations numeric data based on a character column?
How to get means and standard deviations numeric data based on a character column?

Time:11-21

I'm trying to find a way to do something which is probably quite simple. I want to get the average values and standard deviations of "A", "B" and "C" for each day in the following dataset:

M <- c("A", "A", "A", "A", "A", "B","B","B","B", "B", "C", "C", "C","C", "C" )
DCol <- c("19800101","19800102", "19800103", "19800104", "19800105","19800101","19800102", "19800103", "19800104", "19800105","19800101","19800102", "19800103", "19800104", "19800105")
V1 <- c(-6.8,-6.5,-6.05,-6.5,-5.2,-7.08,-5.7,-4.6,-4.6,-6.8,-6.5,-6.05,-6.5,-5.2, -7.06)
V2 <- c(-11.04,-11.1,-10.9,-10.6,-9.6,-11.6,-11.6,-9.7,-8.8,-11.1,-10.9,-10.6,-9.6,-11.6, -10.0)
V3 <- c(1.1,1.3,1.8,1.6,0.6,1.1,1.3,1.5,1.7,0.6,1.1,1.3,1.5,1.7, 1.1)
df <- data.frame(M, DCol, V1, V2, V3)

df Where M is a climate model, DCol is a series of dates, and V 1:V3 the results by model. So the data frame looks as follows:

       M   DCol       V1      V2       V3   
 [1,] "A" "19800101" "-6.8"  "-11.04" "1.1"
 [2,] "A" "19800102" "-6.5"  "-11.1"  "1.3"
 [3,] "A" "19800103" "-6.05" "-10.9"  "1.8"
 [4,] "A" "19800104" "-6.5"  "-10.6"  "1.6"
 [5,] "A" "19800105" "-5.2"  "-9.6"   "0.6"
 [6,] "B" "19800101" "-7.08" "-11.6"  "1.1"
 [7,] "B" "19800102" "-5.7"  "-11.6"  "1.3"
 [8,] "B" "19800103" "-4.6"  "-9.7"   "1.5"
 [9,] "B" "19800104" "-4.6"  "-8.8"   "1.7"
[10,] "B" "19800105" "-6.8"  "-11.1"  "0.6"
[11,] "C" "19800101" "-6.5"  "-10.9"  "1.1"
[12,] "C" "19800102" "-6.05" "-10.6"  "1.3"
[13,] "C" "19800103" "-6.5"  "-9.6"   "1.5"
[14,] "C" "19800104" "-5.2"  "-11.6"  "1.7"
[15,] "C" "19800105" "-7.06" "-10"    "1.1"

The resulting output in this instance would be a five row dataset with DCol, V1, V2, V3, and if possible standard deviations in adjacent columns.

    Date   period      Model RCP     Date meanTemp maxTemp minTemp precipitation windSpeed        rad humidity
    101908      2 HadGEM2-ES  26 19800101    -6.60    -5.9    -7.3          0.04 0.8217593   8.101852    100.0
    101909      2 HadGEM2-ES  26 19800102    -6.20    -5.0    -7.4          0.08 2.2453704   9.259259    100.0
    101910      2 HadGEM2-ES  26 19800103    -5.70    -5.0    -6.4          0.28 1.9444444   8.101852     94.7
    101911      2 HadGEM2-ES  26 19800104    -5.70    -5.0    -6.4          0.08 1.0416667   8.101852     97.5
    101912      2 HadGEM2-ES  26 19800105    -6.20    -5.0    -7.4          0.00 1.1226852   9.259259     98.5

A sample of the whole dataset, I have to say I don't understand the aggregate function well enough to know why it isn't working here. Thanks in advance.

CodePudding user response:

You can use aggregate() from base R i.e.,

aggregate(.~DCol, df[-1], FUN = function(x) c(avg = mean(x), sd = sd(x)))

      DCol     V1.avg      V1.sd      V2.avg       V2.sd     V3.avg      V3.sd
1 19800101 -6.7933333  0.2900575 -11.1800000   0.3704052 1.10000000 0.00000000
2 19800102 -6.0833333  0.4010403 -11.1000000   0.5000000 1.30000000 0.00000000
3 19800103 -5.7166667  0.9928914 -10.0666667   0.7234178 1.60000000 0.17320508
4 19800104 -5.4333333  0.9712535 -10.3333333   1.4189198 1.66666667 0.05773503
5 19800105 -6.3533333  1.0072405 -10.2333333   0.7767453 0.76666667 0.28867513

CodePudding user response:

Update after clarification:

df %>% 
  as_tibble() %>% 
  type.convert(as.is = TRUE) %>% 
  dplyr::group_by(DCol) %>%
  summarise(across(c(V1, V2, V3), list(mean = mean, sd = sd), .names = "{col}_{fn}"))

     DCol V1_mean V1_sd V2_mean V2_sd V3_mean  V3_sd
     <int>   <dbl> <dbl>   <dbl> <dbl>   <dbl>  <dbl>
1 19800101   -6.79 0.290   -11.2 0.370   1.1   0     
2 19800102   -6.08 0.401   -11.1 0.5     1.3   0     
3 19800103   -5.72 0.993   -10.1 0.723   1.6   0.173 
4 19800104   -5.43 0.971   -10.3 1.42    1.67  0.0577
5 19800105   -6.35 1.01    -10.2 0.777   0.767 0.289

First answer:

We could do something like this: First transform to tibble (df is matrix) with as_tibble, then transform V1:V3 to numeric with type.convert(as.is=TRUE), and finally use mutate with across:

library(dplyr)
library(tibble)

df %>% 
  as_tibble() %>% 
  type.convert(as.is = TRUE) %>% 
  group_by(M) %>%
  mutate(across(c(V1, V2, V3), list(mean = mean, sd = sd), .names = "{col}_{fn}"))
  M         DCol    V1    V2    V3 V1_mean V1_sd V2_mean V2_sd V3_mean V3_sd
   <chr>    <int> <dbl> <dbl> <dbl>   <dbl> <dbl>   <dbl> <dbl>   <dbl> <dbl>
 1 A     19800101 -6.8  -11.0   1.1   -6.21 0.625   -10.6 0.617    1.28 0.466
 2 A     19800102 -6.5  -11.1   1.3   -6.21 0.625   -10.6 0.617    1.28 0.466
 3 A     19800103 -6.05 -10.9   1.8   -6.21 0.625   -10.6 0.617    1.28 0.466
 4 A     19800104 -6.5  -10.6   1.6   -6.21 0.625   -10.6 0.617    1.28 0.466
 5 A     19800105 -5.2   -9.6   0.6   -6.21 0.625   -10.6 0.617    1.28 0.466
 6 B     19800101 -7.08 -11.6   1.1   -5.76 1.17    -10.6 1.25     1.24 0.422
 7 B     19800102 -5.7  -11.6   1.3   -5.76 1.17    -10.6 1.25     1.24 0.422
 8 B     19800103 -4.6   -9.7   1.5   -5.76 1.17    -10.6 1.25     1.24 0.422
 9 B     19800104 -4.6   -8.8   1.7   -5.76 1.17    -10.6 1.25     1.24 0.422
10 B     19800105 -6.8  -11.1   0.6   -5.76 1.17    -10.6 1.25     1.24 0.422
11 C     19800101 -6.5  -10.9   1.1   -6.26 0.693   -10.5 0.780    1.34 0.261
12 C     19800102 -6.05 -10.6   1.3   -6.26 0.693   -10.5 0.780    1.34 0.261
13 C     19800103 -6.5   -9.6   1.5   -6.26 0.693   -10.5 0.780    1.34 0.261
14 C     19800104 -5.2  -11.6   1.7   -6.26 0.693   -10.5 0.780    1.34 0.261
15 C     19800105 -7.06 -10     1.1   -6.26 0.693   -10.5 0.780    1.34 0.261

CodePudding user response:

The slightly-tweaked tidyverse format would be:

library(tidyverse)


df %>% 
  group_by(DCol) %>%
  summarise(across(c(V1, V2, V3), list(mean = mean, sd = sd), .names = "{col}_{fn}"))
#> # A tibble: 5 × 7
#>   DCol     V1_mean V1_sd V2_mean V2_sd V3_mean  V3_sd
#>   <chr>      <dbl> <dbl>   <dbl> <dbl>   <dbl>  <dbl>
#> 1 19800101   -6.79 0.290   -11.2 0.370   1.1   0     
#> 2 19800102   -6.08 0.401   -11.1 0.5     1.3   0     
#> 3 19800103   -5.72 0.993   -10.1 0.723   1.6   0.173 
#> 4 19800104   -5.43 0.971   -10.3 1.42    1.67  0.0577
#> 5 19800105   -6.35 1.01    -10.2 0.777   0.767 0.289
  •  Tags:  
  • r
  • Related