Adding "count where" in dplyr (R)-CodePudding

I am using the dplyr library in R. I have the following data:

col1 = as.factor(c("a", "a", "a", "b", "b", "c", "c", "c"))
col2 = c(1,1,0,0,1, 0, 0, 1)

dplyr_data = data.frame(col1, col2)

head(dplyr_data)
  col1 col2
1    a    1
2    a    1
3    a    0
4    b    0
5    b    1
6    c    0
7    c    0
8    c    1

I am wondering if it is possible to directly write a code like this:

library(dplyr)

summary_dplyr = data.frame(dplyr_data %>% group_by(col1) %>% dplyr::summarise(mean_count = mean(col1, na.rm = TRUE), special_count = count(1 - nrow(dplyr_data))))

This returns the following error:

Error: Problem with `summarise()` input `special_count`.
x no applicable method for 'group_vars' applied to an object of class "c('double', 'numeric')"
i Input `special_count` is `count(1 - nrow(dplyr_data))`.
i The error occurred in group 1: col1 = "a".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In mean.default(col1, na.rm = TRUE) :
  argument is not numeric or logical: returning NA
2: In mean.default(col1, na.rm = TRUE) :
  argument is not numeric or logical: returning NA
3: In mean.default(col1, na.rm = TRUE) :
  argument is not numeric or logical: returning NA

I am trying to get the following output:

col1 mean_count   special_count
1    a 0.66      3-1 = 2
2    b 0.50      2-1 = 1
3    c 0.33      3-2 = 1

Basically, "special_count" = for each unique group of col_1 (i.e. a, b, c) : take the total number of rows and subtract the number of 0's.

Can someone please show me how to do this?

Thanks

CodePudding user response：

You can't use count() inside the summarize but you can count values using sum() with a boolean value. sum(col2==0) will tell you the number of rows with 0 and n() gives the total number of rows (per group)

dplyr_data %>% 
  group_by(col1) %>% 
  summarize(mean_count=mean(col2),
          special_count = n() - sum(col2==0))