Home > database >  Summarising same column twice with dplyr return NA
Summarising same column twice with dplyr return NA

Time:03-17

Consider the following minimal working example in R:

library(tidyverse)

df <- data_frame(
  colour=c('red', 'red', 'blue', 'blue'),
  value=c(1, 1, 2, 2)
)

df %>%
  group_by(colour) %>%
  summarise(
    value=mean(value),
    value.sd=sd(value),
  )

The output is

# A tibble: 2 × 3
  colour value value.sd
  <chr>  <dbl>    <dbl>
1 blue       2       NA
2 red        1       NA

when the expected output is

# A tibble: 2 × 3
  colour      value value.sd
  <chr>       <dbl>    <dbl>
1 blue            2        0
2 red             1        0

I know how to work around the issue. As the following code will provide the expected output:

df %>%
  group_by(colour) %>%
  summarise(
    value.mean=mean(value),
    value.sd=sd(value),
  )

My question is: am I using R/dplyr wrongly in the first code sample or this a bug in dplyr?

CodePudding user response:

When I ran you code I got a warning that data_frame was deprecated.

This works

df <- tibble(
    colour=c('red', 'red', 'blue', 'blue'),
    value=c(1, 1, 2, 2)
)

df %>%
    group_by(colour) %>%
    summarise(
        value.mean = mean(value),
        value.sd=sd(value)
    )
# A tibble: 2 × 3
  colour value.mean value.sd
  <chr>       <dbl>    <dbl>
1 blue            2        0
2 red             1        0

So I would suggest trying that because maybe there was a bug that was fixed.

  • Related