Home > Software design >  Adding "count where" in dplyr (R)
Adding "count where" in dplyr (R)

Time:10-28

I am using the dplyr library in R. I have the following data:

col1 = as.factor(c("a", "a", "a", "b", "b", "c", "c", "c"))
col2 = c(1,1,0,0,1, 0, 0, 1)

dplyr_data = data.frame(col1, col2)

head(dplyr_data)
  col1 col2
1    a    1
2    a    1
3    a    0
4    b    0
5    b    1
6    c    0
7    c    0
8    c    1

I am wondering if it is possible to directly write a code like this:

library(dplyr)

summary_dplyr = data.frame(dplyr_data %>% group_by(col1) %>% dplyr::summarise(mean_count = mean(col1, na.rm = TRUE), special_count = count(1 - nrow(dplyr_data))))

This returns the following error:

Error: Problem with `summarise()` input `special_count`.
x no applicable method for 'group_vars' applied to an object of class "c('double', 'numeric')"
i Input `special_count` is `count(1 - nrow(dplyr_data))`.
i The error occurred in group 1: col1 = "a".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In mean.default(col1, na.rm = TRUE) :
  argument is not numeric or logical: returning NA
2: In mean.default(col1, na.rm = TRUE) :
  argument is not numeric or logical: returning NA
3: In mean.default(col1, na.rm = TRUE) :
  argument is not numeric or logical: returning NA

I am trying to get the following output:

col1 mean_count   special_count
1    a 0.66      3-1 = 2
2    b 0.50      2-1 = 1
3    c 0.33      3-2 = 1

Basically, "special_count" = for each unique group of col_1 (i.e. a, b, c) : take the total number of rows and subtract the number of 0's.

Can someone please show me how to do this?

Thanks

CodePudding user response:

You can't use count() inside the summarize but you can count values using sum() with a boolean value. sum(col2==0) will tell you the number of rows with 0 and n() gives the total number of rows (per group)

dplyr_data %>% 
  group_by(col1) %>% 
  summarize(mean_count=mean(col2),
          special_count = n() - sum(col2==0))
  • Related