Home > database >  Which IDs have only zero-counts in variable across all days?
Which IDs have only zero-counts in variable across all days?

Time:08-04

In my dataset there is the variable "cigarettes per day" (CPD) for 21 days and several subjects (ID). I want to know how many and which subjects never smoked (e.g. have only 0 in CPD) across the 21 days.

Here is a example dataset for 3 subjects and 5 days

day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
CPD <- c(3,4,0,2,0,0,0,0,0,0,4,0,0,0,1)
df <- data.frame(day, ID, CPD)

what I want would be something like this:

   day ID CPD
1    1  2 0
2    2  2 0
3    3  2 0
4    4  2 0
5    5  2 0

CodePudding user response:

We may use a group by all approach

library(dplyr)
df %>% 
   group_by(ID) %>%
   filter(all(CPD %in% 0)) %>%
   ungroup

-output

# A tibble: 5 × 3
    day    ID   CPD
  <dbl> <dbl> <dbl>
1     1     2     0
2     2     2     0
3     3     2     0
4     4     2     0
5     5     2     0

Or without grouping

df %>% 
   filter(!ID %in% ID[CPD != 0])
  day ID CPD
1   1  2   0
2   2  2   0
3   3  2   0
4   4  2   0
5   5  2   0

Or with base R

subset(df, !ID %in% ID[CPD != 0])

CodePudding user response:

Here is a slighltly modified dplyr (@akrun) approach:

libaray(dplyr)

df %>% 
  group_by(ID) %>% 
  filter(all(CPD==0)==TRUE)

# Groups:   ID [1]
    day    ID   CPD
  <dbl> <dbl> <dbl>
1     1     2     0
2     2     2     0
3     3     2     0
4     4     2     0
5     5     2     0

and here is a data.table approach:

library(data.table)

setDT(df)[,if(CPD %in% 0) .SD , by = ID]  

   ID day CPD
1:  2   1   0
2:  2   2   0
3:  2   3   0
4:  2   4   0
5:  2   5   0
  • Related