Count number of individuals with a condition (dummy) paneled data-CodePudding

Due to privacy issues, I can't share the original dataset or my original code. Therefore, I have created an example.

Suppose that I want to count how many individuals have obtained a degree in higher education. This means that I want to know for how many individuals the HEdummy == 0. I am struggling with how I can do this... In the example below, the correct answer would be 0. I have tried to create a table and to use the count/unique functions, but I have no clue how I can distinct between individuals without summing all '1's.

df <- data.frame (Individual  = c("1", "1", "1","1","2","2","2","3","4","4",'4',"4"),
                  Time = c("2011", "2012", "2013","2014","2011","2012","2012","2017","2014","2015",'2016',"2017"),
                  HigherEducationDummy = c("1", "1", "1","1","0","0","0","1","0","0",'0',"0"))

CodePudding user response：

Not sure why the answer would be 0, but based on the rest of the description it seems you could do summarize over the years for each individual.

library(dplyr)

df %>% 
  group_by(Individual) %>% 
  summarize(hasHE = !any(HigherEducationDummy == "1")) %>%
  select(hasHE) %>% 
  sum()

This would tell you how many people never achieved higher education in the years. You could also replace sum with table to get a count of all categories.

CodePudding user response：

In base R, we can do

with(df, sum(!unique(Individual) %in% 
       unique(Individual[HigherEducationDummy == 1])))
[1] 2