Having trouble finding simple means with DLPYR - `mean()` command is not behaving as envisioned-CodePudding

I'm trying to use dplyr to find the simple mean of a column of values I grouped together:

My initial attempts were to enter the code as follows:

cust_id_flags_3 = customer_sleep %>% group_by(flags) %>% count(flags) %>% summarise(mean_val = mean(n))

But the output I get is a table

# A tibble: 27 x 2
   flags mean_val
   <dbl>    <dbl>
 1     0     1966
 2     1     2555
 3     2     1263
 4     3     1694
 5     4     1452
 6     5      989
 7     6      879
 8     7      709
 9     8      712
10     9      530
# ... with 17 more rows

What I wanted was the mean of the values in the column mean_val. I am able to get it by computing it manually:

> mean_test = sum(cust_id_flags_3$mean_val)/nrow(cust_id_flags_3) 
> mean_test
[1] 569.037

Below is the data set I'm using to perform the calculation. But I know it is me doing something wrong with applying my tidyverse verbs. For context I'm doing this in order to be able to illustrate means to use a poisson regression. Thanks for any assistance.

> dput(cust_id_flags_3)
structure(list(flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), 
    n = c(1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L, 
    712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L, 
    86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -27L), groups = structure(list(
    flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
    15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), .rows = structure(list(
        1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
        14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 
        25L, 26L, 27L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -27L), .drop = TRUE))

CodePudding user response：

Your data is already grouped, I can replicate the mean of 569 by

library(dplyr)
df %>% 
  ungroup() %>%
  summarise(mean_val = mean(n))

Right now you only have one value per group (flags) such that the mean value is always the value/1. If I adjust your data to include more values per group, the group_by() command in combination with summarise works as expected.

df <- tibble(
  flags = c(
    0, 1, 1, 2, 2, 5, 6, 7, 8, 9, 10, 11,
    12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
  ),
  n = c(
    1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L,
    712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L,
    86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L
  )
)
df %>% 
  group_by(flags) %>%
  summarise(mean_val = mean(n), count = n())

count = n() adds you a integer for the number of observations per group.