I have the following table and I have to obtain a standard deviation of y for each unique value of x.
ID x y
1 1 4
2 2 3
3 3 7
4 1 2
5 2 6
6 3 8
For example, each unique value of x, I have y=4 and y=2, so the standard deviation will be:
x1 <- c(4,2)
sd(x1)
#output is 1.41
x2 <-c(3,6)
sd(x2)
#output is 2.21
x3 <-c(3,6)
sd(x3)
#output is 0.71
Instead of getting each output and put it in a data frame using the long way, is there a way to do it faster using dplyr and the pipe? I tried to use mutate and group_by, but it doesn't seem to work. I would like the result to look the following with count_y (# of y values to each unique x)
x count_y Std_Dev
1 2 1.41
2 2 2.21
3 2 0.71
CodePudding user response:
We don't need mutate
(mutate
creates or transforms column). Here, the output needed is one row per group which can be done with summarise
library(dplyr)
df1 %>%
group_by(x) %>%
summarise(count_y = n(), Std_Dev = sd(y))
-output
# A tibble: 3 × 3
x count_y Std_Dev
<int> <int> <dbl>
1 1 2 1.41
2 2 2 2.12
3 3 2 0.707
data
df1 <- structure(list(ID = 1:6, x = c(1L, 2L, 3L, 1L, 2L, 3L), y = c(4L,
3L, 7L, 2L, 6L, 8L)), class = "data.frame", row.names = c(NA,
-6L))