Home > database >  Using dplyr to group the new calculations into one data frame
Using dplyr to group the new calculations into one data frame

Time:10-14

I have the following table and I have to obtain a standard deviation of y for each unique value of x.

  ID    x   y

   1    1   4   
   2    2   3   
   3    3   7   
   4    1   2   
   5    2   6   
   6    3   8   

For example, each unique value of x, I have y=4 and y=2, so the standard deviation will be:

x1 <- c(4,2)
sd(x1) 
#output is 1.41

x2 <-c(3,6)
sd(x2)
#output is 2.21

x3 <-c(3,6)
sd(x3)
#output is 0.71

Instead of getting each output and put it in a data frame using the long way, is there a way to do it faster using dplyr and the pipe? I tried to use mutate and group_by, but it doesn't seem to work. I would like the result to look the following with count_y (# of y values to each unique x)

x   count_y  Std_Dev
    
1   2        1.41
2   2        2.21
3   2        0.71

CodePudding user response:

We don't need mutate (mutate creates or transforms column). Here, the output needed is one row per group which can be done with summarise

library(dplyr)
df1 %>% 
   group_by(x) %>%
   summarise(count_y = n(), Std_Dev = sd(y))

-output

# A tibble: 3 × 3
      x count_y Std_Dev
  <int>   <int>   <dbl>
1     1       2   1.41 
2     2       2   2.12 
3     3       2   0.707

data

df1 <- structure(list(ID = 1:6, x = c(1L, 2L, 3L, 1L, 2L, 3L), y = c(4L, 
3L, 7L, 2L, 6L, 8L)), class = "data.frame", row.names = c(NA, 
-6L))
  • Related