I'm trying to use dplyr
to find the simple mean of a column of values I grouped together:
My initial attempts were to enter the code as follows:
cust_id_flags_3 = customer_sleep %>% group_by(flags) %>% count(flags) %>% summarise(mean_val = mean(n))
But the output I get is a table
# A tibble: 27 x 2
flags mean_val
<dbl> <dbl>
1 0 1966
2 1 2555
3 2 1263
4 3 1694
5 4 1452
6 5 989
7 6 879
8 7 709
9 8 712
10 9 530
# ... with 17 more rows
What I wanted was the mean of the values in the column mean_val
. I am able to get it by computing it manually:
> mean_test = sum(cust_id_flags_3$mean_val)/nrow(cust_id_flags_3)
> mean_test
[1] 569.037
Below is the data set I'm using to perform the calculation. But I know it is me doing something wrong with applying my tidyverse
verbs. For context I'm doing this in order to be able to illustrate means to use a poisson regression. Thanks for any assistance.
> dput(cust_id_flags_3)
structure(list(flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26),
n = c(1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L,
712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L,
86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -27L), groups = structure(list(
flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), .rows = structure(list(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -27L), .drop = TRUE))
CodePudding user response:
Your data is already grouped, I can replicate the mean of 569 by
library(dplyr)
df %>%
ungroup() %>%
summarise(mean_val = mean(n))
Right now you only have one value per group (flags) such that the mean value is always the value/1. If I adjust your data to include more values per group, the group_by()
command in combination with summarise
works as expected.
df <- tibble(
flags = c(
0, 1, 1, 2, 2, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
),
n = c(
1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L,
712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L,
86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L
)
)
df %>%
group_by(flags) %>%
summarise(mean_val = mean(n), count = n())
count = n()
adds you a integer for the number of observations per group.