In my dataset there is the variable "cigarettes per day" (CPD) for 21 days and several subjects (ID). I want to know how many and which subjects never smoked (e.g. have only 0 in CPD) across the 21 days.
Here is a example dataset for 3 subjects and 5 days
day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
CPD <- c(3,4,0,2,0,0,0,0,0,0,4,0,0,0,1)
df <- data.frame(day, ID, CPD)
what I want would be something like this:
day ID CPD
1 1 2 0
2 2 2 0
3 3 2 0
4 4 2 0
5 5 2 0
CodePudding user response:
We may use a group by all
approach
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(CPD %in% 0)) %>%
ungroup
-output
# A tibble: 5 × 3
day ID CPD
<dbl> <dbl> <dbl>
1 1 2 0
2 2 2 0
3 3 2 0
4 4 2 0
5 5 2 0
Or without grouping
df %>%
filter(!ID %in% ID[CPD != 0])
day ID CPD
1 1 2 0
2 2 2 0
3 3 2 0
4 4 2 0
5 5 2 0
Or with base R
subset(df, !ID %in% ID[CPD != 0])
CodePudding user response:
Here is a slighltly modified dplyr
(@akrun) approach:
libaray(dplyr)
df %>%
group_by(ID) %>%
filter(all(CPD==0)==TRUE)
# Groups: ID [1]
day ID CPD
<dbl> <dbl> <dbl>
1 1 2 0
2 2 2 0
3 3 2 0
4 4 2 0
5 5 2 0
and here is a data.table approach:
library(data.table)
setDT(df)[,if(CPD %in% 0) .SD , by = ID]
ID day CPD
1: 2 1 0
2: 2 2 0
3: 2 3 0
4: 2 4 0
5: 2 5 0