I got the following dataset
ID = c('A','A','A','A','A','A','B','B','B','B')
B = c(1,2,1,3,2,2,1,2,3,4)
date = as.Date(c('1990-01-01','1990-01-01','1990-01-02','1990-01-02',
'1990-01-03','1990-01-04','1990-03-01','1990-03-01',
'1990-03-01','1990-03-01'))
dataset1 <- data.frame(ID,B,date)
ID B date
1 A 1 1990-01-01
2 A 2 1990-01-01
3 A 1 1990-01-02
4 A 3 1990-01-02
5 A 2 1990-01-03
6 A 2 1990-01-04
7 B 1 1990-03-01
8 B 2 1990-03-01
9 B 3 1990-03-01
10 B 4 1990-03-01
Desired output:
ID B date
2 A 2 1990-01-01
4 A 3 1990-01-02
8 B 2 1990-03-01
9 B 3 1990-03-01
10 B 4 1990-03-01
or
ID B date
1 A 1 1990-01-01
2 A 2 1990-01-01
3 A 1 1990-01-02
4 A 3 1990-01-02
7 B 1 1990-03-01
8 B 2 1990-03-01
9 B 3 1990-03-01
10 B 4 1990-03-01
I can simply drop rows with B=1 in the second case.
I want to get the other values of column B in which the date is equal to the date at B=1. For example, ID 'A' with B=1 has dates 1990-01-01 and 1990-01-02. I want to get the rows based on date[B=1].
My attempt :
dataset1 %>% group_by(ID) %>% arrange(date) %>%
filter(date %in% date[B=1])
I got:
ID B date
<chr> <dbl> <date>
1 A 1 1990-01-01
2 A 2 1990-01-01
3 B 1 1990-03-01
4 B 2 1990-03-01
5 B 3 1990-03-01
6 B 4 1990-03-01
These rows are missing:
A 1 1990-01-02
A 3 1990-01-02
I also tried add any or all inside filter but didn't have what I wanted. Appreciate any helps!
CodePudding user response:
You almost have it: the comparison B=1
it should be B==2
==
is always used for equality testing: See explanation here: What is the difference between = and ==?
library(dplyr)
dataset1 %>%
group_by(ID) %>%
arrange(date, .by_group = TRUE) %>%
filter(date %in% date[B==1])
ID B date
<chr> <dbl> <date>
1 A 1 1990-01-01
2 A 2 1990-01-01
3 A 1 1990-01-02
4 A 3 1990-01-02
5 B 1 1990-03-01
6 B 2 1990-03-01
7 B 3 1990-03-01
8 B 4 1990-03-01
CodePudding user response:
We can use a group by approach with 'ID', 'date' as groups and then filter
where 'B' is not 1 and where there are only groups having 'B' value of 1
library(dplyr)
dataset1 %>%
group_by(ID, date) %>%
filter(B != 1 & any(B == 1)) %>%
ungroup
-output
# A tibble: 5 × 3
ID B date
<chr> <dbl> <date>
1 A 2 1990-01-01
2 A 3 1990-01-02
3 B 2 1990-03-01
4 B 3 1990-03-01
5 B 4 1990-03-01
CodePudding user response:
Or without any grouping:
library(dplyr)
dataset1 |>
filter(date %in% date[B == 1] & B != 1)
Output:
ID B date
1 A 2 1990-01-01
2 A 3 1990-01-02
3 B 2 1990-03-01
4 B 3 1990-03-01
5 B 4 1990-03-01