Home > Back-end >  how to calculate standard error of all variables by group
how to calculate standard error of all variables by group

Time:04-14

I have dataframe contain variables :

    Group   high  weigh age col5

row1   A       12    57   18   AA
row2   C       22    80   29   BB
row3   B       17    70   20   CC
row4   A       13    60   26   DD
row5   D       19    69   25   AA
row6   B       10    15   19   BB
row7   C       20    66   22   CC 
row8   D       13    53   18   DD

i want to calulate standar error using the function std.error from package plotrix or using other method ( like calculating directly sd/sqrt(length(data[,column])) of all quantitative error by group in (first column), so the result i want is

      Group   se_high   se_weigh  se_age     
row1   A       0.223       0.023    0.1   
row3   B       0.12        0.1      0.12   
row7   C       0.1         0.04     0.09
row8   D      0.05         0.12     0.07

i tried to use group_by dplyr fubction to group column one and then use std.error but i don't know how to combine them

#this is the dplyr function to calculate the mean by group
library(dplyr)
 data %>%
   group_by(group) %>% 
   summarise_at(vars("A", "B", "C","D"), mean)

i also would like to know how to calculate std.error by two groups ( column 1 and last column 5 for example )

Thank you

CodePudding user response:

You were close! Summarize_at is actually deprecated now so here's what I'd do:

library(dplyr)
data %>%
  group_by(Group) %>%
  summarize(se_high=plotrix::std.error(high),
            se_weigh=plotrix::std.error(weigh),
            se_age=plotrix::std.error(age))

which returns

# A tibble: 4 x 4
  Group se_high se_weigh se_age
  <chr>   <dbl>    <dbl>  <dbl>
1 A         0.5      1.5    4  
2 B         3.5     27.5    0.5
3 C         1        7      3.5
4 D         3        8      3.5

CodePudding user response:

Here is a solution to do it in one go:

library(dplyr)

df %>%
  group_by(Group) %>%
  summarise(across(where(is.numeric), ~ sd(.x)/ sqrt(length(.x)), .names = "std_{.col}"))

# A tibble: 4 x 4
  Group std_high std_weigh std_age
  <chr>    <dbl>     <dbl>   <dbl>
1 A          0.5       1.5     4  
2 B          3.5      27.5     0.5
3 C          1         7       3.5
4 D          3         8       3.5
  • Related