Home > Blockchain >  R dplyr get subset with certain condition
R dplyr get subset with certain condition

Time:08-04

I got the following dataset

ID = c('A','A','A','A','A','A','B','B','B','B')
B = c(1,2,1,3,2,2,1,2,3,4)
date = as.Date(c('1990-01-01','1990-01-01','1990-01-02','1990-01-02',
                 '1990-01-03','1990-01-04','1990-03-01','1990-03-01',
                 '1990-03-01','1990-03-01'))

dataset1 <- data.frame(ID,B,date)

   ID B       date
1   A 1 1990-01-01
2   A 2 1990-01-01
3   A 1 1990-01-02
4   A 3 1990-01-02
5   A 2 1990-01-03
6   A 2 1990-01-04
7   B 1 1990-03-01
8   B 2 1990-03-01
9   B 3 1990-03-01
10  B 4 1990-03-01

Desired output:

   ID B       date
2   A 2 1990-01-01
4   A 3 1990-01-02
8   B 2 1990-03-01
9   B 3 1990-03-01
10  B 4 1990-03-01

or

   ID B       date
1   A 1 1990-01-01
2   A 2 1990-01-01
3   A 1 1990-01-02
4   A 3 1990-01-02
7   B 1 1990-03-01
8   B 2 1990-03-01
9   B 3 1990-03-01
10  B 4 1990-03-01

I can simply drop rows with B=1 in the second case.

I want to get the other values of column B in which the date is equal to the date at B=1. For example, ID 'A' with B=1 has dates 1990-01-01 and 1990-01-02. I want to get the rows based on date[B=1].

My attempt :

dataset1 %>% group_by(ID) %>% arrange(date) %>% 
  filter(date %in% date[B=1])

I got:

ID        B date      
  <chr> <dbl> <date>    
1 A         1 1990-01-01
2 A         2 1990-01-01
3 B         1 1990-03-01
4 B         2 1990-03-01
5 B         3 1990-03-01
6 B         4 1990-03-01

These rows are missing:

  A         1 1990-01-02
  A         3 1990-01-02

I also tried add any or all inside filter but didn't have what I wanted. Appreciate any helps!

CodePudding user response:

You almost have it: the comparison B=1 it should be B==2 == is always used for equality testing: See explanation here: What is the difference between = and ==?

library(dplyr)

dataset1 %>% 
  group_by(ID) %>% 
  arrange(date, .by_group = TRUE) %>% 
  filter(date %in% date[B==1])

  ID        B date      
  <chr> <dbl> <date>    
1 A         1 1990-01-01
2 A         2 1990-01-01
3 A         1 1990-01-02
4 A         3 1990-01-02
5 B         1 1990-03-01
6 B         2 1990-03-01
7 B         3 1990-03-01
8 B         4 1990-03-01

CodePudding user response:

We can use a group by approach with 'ID', 'date' as groups and then filter where 'B' is not 1 and where there are only groups having 'B' value of 1

library(dplyr)
dataset1 %>% 
   group_by(ID, date) %>%
   filter(B != 1 & any(B == 1)) %>%
   ungroup

-output

# A tibble: 5 × 3
  ID        B date      
  <chr> <dbl> <date>    
1 A         2 1990-01-01
2 A         3 1990-01-02
3 B         2 1990-03-01
4 B         3 1990-03-01
5 B         4 1990-03-01

CodePudding user response:

Or without any grouping:

library(dplyr)

dataset1 |> 
  filter(date %in% date[B == 1] & B != 1)

Output:

  ID B       date
1  A 2 1990-01-01
2  A 3 1990-01-02
3  B 2 1990-03-01
4  B 3 1990-03-01
5  B 4 1990-03-01
  • Related