structure(list(id = c(1L, 1L, 2L, 3L, 3L, 3L, 4L), hire_year = c(2017L,
2017L, 2017L, 2017L, 2016L, 2014L, 2016L), dummy = c(0L, 0L,
1L, 0L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA,
-7L))
id hire_year dummy
1 1 2017 0
2 1 2017 0
3 2 2017 1
4 3 2017 0
5 3 2016 0
6 3 2014 0
7 4 2016 1
I would like to count the number of rows for which the dummy equals 0. However, I would like each id to make the count only once, even though for the same id I may have more than one row with the dummy equaling 0. Here I would expect the output to be [2].
CodePudding user response:
You may use distinct
to keep only unique rows then count number of 0's.
df %>%
distinct(id, .keep_all = TRUE) %>%
summarise(dummy = sum(dummy == 0))
# dummy
#1 2
CodePudding user response:
length(unique(df$id[df$dummy==0]))
CodePudding user response:
Use filter to find only the responses with zeroes, use distinct to count each id only once and summarise to count the values:
library(tidyverse)
df = bind_cols(id = c(1,1,2,3,3,3,4), hire_year = c(rep(2017, 4), 2016, 2014, 2016), dummy = c(0,0,1,0,0,0,1))
df %>% filter(dummy == 0) %>% distinct(id) %>% summarise(count = n())