Home > Back-end >  Dataframe containing all information for unique IDs based on presence/absence of another variable
Dataframe containing all information for unique IDs based on presence/absence of another variable

Time:02-18

I have a dataset that looks somewhat as follows.

data <- data.frame(
  id = c(1,1,1,2,2,2,3,3,3,4,4,4),
  death = c(0,0,1,0,0,0,0,1,0,0,0,0),
  other = letters[1:12])

I need to create a new data frame that includes all rows with the unique IDs for any ID that has had a death, much like this:

ID Death Other
1 0 a
1 0 b
1 1 c
3 0 g
3 1 h
3 0 i

I feel like I'm missing something simple, but any time I try to subset by ID, I get error messages about length and not being able to subset with longer/shorter vectors. Any help would be much appreciated!

CodePudding user response:

Here's a dplyr approach. It treats id as a group, and filter away any group that do not have death > 0.

library(dplyr)

data %>% group_by(id) %>% filter(any(death > 0))

# A tibble: 6 x 3
# Groups:   id [2]
     id death other
  <dbl> <dbl> <chr>
1     1     0 a    
2     1     0 b    
3     1     1 c    
4     3     0 g    
5     3     1 h    
6     3     0 i    

CodePudding user response:

As @thelatemail points out in the comments, this can be done in base R with:

data <- data.frame(id = c(1,1,1,2,2,2,3,3,3,4,4,4), death = c(0,0,1,0,0,0,0,1,0,0,0,0), other = letters[1:12])

data[data$id %in% data$id[data$death==1],]
#>   id death other
#> 1  1     0     a
#> 2  1     0     b
#> 3  1     1     c
#> 7  3     0     g
#> 8  3     1     h
#> 9  3     0     i

Created on 2022-02-18 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related