Consider the following minimal working example in R:
library(tidyverse)
df <- data_frame(
colour=c('red', 'red', 'blue', 'blue'),
value=c(1, 1, 2, 2)
)
df %>%
group_by(colour) %>%
summarise(
value=mean(value),
value.sd=sd(value),
)
The output is
# A tibble: 2 × 3
colour value value.sd
<chr> <dbl> <dbl>
1 blue 2 NA
2 red 1 NA
when the expected output is
# A tibble: 2 × 3
colour value value.sd
<chr> <dbl> <dbl>
1 blue 2 0
2 red 1 0
I know how to work around the issue. As the following code will provide the expected output:
df %>%
group_by(colour) %>%
summarise(
value.mean=mean(value),
value.sd=sd(value),
)
My question is: am I using R/dplyr wrongly in the first code sample or this a bug in dplyr?
CodePudding user response:
When I ran you code I got a warning that data_frame was deprecated.
This works
df <- tibble(
colour=c('red', 'red', 'blue', 'blue'),
value=c(1, 1, 2, 2)
)
df %>%
group_by(colour) %>%
summarise(
value.mean = mean(value),
value.sd=sd(value)
)
# A tibble: 2 × 3
colour value.mean value.sd
<chr> <dbl> <dbl>
1 blue 2 0
2 red 1 0
So I would suggest trying that because maybe there was a bug that was fixed.