Home > Net >  Retain records where condition is met across two rows and two columns
Retain records where condition is met across two rows and two columns

Time:08-11

I have a dataset similar to this:

df <- 
  read.table(textConnection("ID   Column1   Column2
A 0 1
A 1 0
A 1 0
A 1 0
A 0 1
A 1 0
A 0 1
A 0 0 
A 1 0 
A 1 0
B 0 1
B 1 0 
C 0 1
C 0 0
C 1 0"), header=TRUE)

I am looking to do a group_by ID in dplyr that maintains records where Column2 = '1' and the record underneath it has Column1 = '1'. This may happen more than once per ID; all other records should be excluded. So the output from the above should be:

ID Column1 Column2
A 0 1
A 1 0
A 0 1
A 1 0
B 0 1
B 1 0

Any help will be very much appreciated, thanks!

CodePudding user response:

You could use lag and lead:

library(dplyr)

df %>%
  group_by(ID) %>%
  filter(lead(Column1) == 1 &     Column2  == 1 |
              Column1  == 1 & lag(Column2) == 1) %>%
  ungroup()

# # A tibble: 6 × 3
#   ID    Column1 Column2
#   <chr>   <int>   <int>
# 1 A           0       1
# 2 A           1       0
# 3 A           0       1
# 4 A           1       0
# 5 B           0       1
# 6 B           1       0

CodePudding user response:

Here is an alternative approach:

library(dplyr)

df %>% 
  group_by(ID, x = rep(row_number(), each=2, length.out = n())) %>% 
  filter(sum(Column1)>=1 & sum(Column2)>=1) %>% 
  ungroup() %>% 
  select(-x)

  ID    Column1 Column2
  <chr>   <int>   <int>
1 A           0       1
2 A           1       0
3 A           0       1
4 A           1       0
5 B           0       1
6 B           1       0
  • Related