I have a dataset similar to this:
df <-
read.table(textConnection("ID Column1 Column2
A 0 1
A 1 0
A 1 0
A 1 0
A 0 1
A 1 0
A 0 1
A 0 0
A 1 0
A 1 0
B 0 1
B 1 0
C 0 1
C 0 0
C 1 0"), header=TRUE)
I am looking to do a group_by ID in dplyr that maintains records where Column2 = '1' and the record underneath it has Column1 = '1'. This may happen more than once per ID; all other records should be excluded. So the output from the above should be:
ID | Column1 | Column2 |
---|---|---|
A | 0 | 1 |
A | 1 | 0 |
A | 0 | 1 |
A | 1 | 0 |
B | 0 | 1 |
B | 1 | 0 |
Any help will be very much appreciated, thanks!
CodePudding user response:
You could use lag
and lead
:
library(dplyr)
df %>%
group_by(ID) %>%
filter(lead(Column1) == 1 & Column2 == 1 |
Column1 == 1 & lag(Column2) == 1) %>%
ungroup()
# # A tibble: 6 × 3
# ID Column1 Column2
# <chr> <int> <int>
# 1 A 0 1
# 2 A 1 0
# 3 A 0 1
# 4 A 1 0
# 5 B 0 1
# 6 B 1 0
CodePudding user response:
Here is an alternative approach:
library(dplyr)
df %>%
group_by(ID, x = rep(row_number(), each=2, length.out = n())) %>%
filter(sum(Column1)>=1 & sum(Column2)>=1) %>%
ungroup() %>%
select(-x)
ID Column1 Column2
<chr> <int> <int>
1 A 0 1
2 A 1 0
3 A 0 1
4 A 1 0
5 B 0 1
6 B 1 0