Count first occurence of a dummy (grouped by) in R and then sum-CodePudding

structure(list(id = c(1L, 1L, 2L, 3L, 3L, 3L, 4L), hire_year = c(2017L, 
2017L, 2017L, 2017L, 2016L, 2014L, 2016L), dummy = c(0L, 0L, 
1L, 0L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA, 
-7L))

  id hire_year dummy
1  1      2017     0
2  1      2017     0
3  2      2017     1
4  3      2017     0
5  3      2016     0
6  3      2014     0
7  4      2016     1

I would like to count the number of rows for which the dummy equals 0. However, I would like each id to make the count only once, even though for the same id I may have more than one row with the dummy equaling 0. Here I would expect the output to be [2].

CodePudding user response：

You may use distinct to keep only unique rows then count number of 0's.

df %>%
  distinct(id, .keep_all = TRUE) %>%
  summarise(dummy = sum(dummy == 0))

#  dummy
#1     2

CodePudding user response：

length(unique(df$id[df$dummy==0]))

CodePudding user response：

Use filter to find only the responses with zeroes, use distinct to count each id only once and summarise to count the values:

library(tidyverse)
df = bind_cols(id = c(1,1,2,3,3,3,4), hire_year = c(rep(2017, 4), 2016, 2014, 2016), dummy = c(0,0,1,0,0,0,1))
df %>% filter(dummy == 0) %>% distinct(id) %>% summarise(count = n())