Home > other >  Group_by - retain rows based on value in below row of dataset
Group_by - retain rows based on value in below row of dataset

Time:08-12

Hi I am looking to retain rows in a dataset similar to the below:

ID Value1 Value2
A 1 0
A 0 1
A 1 1
A 0 1
A 0 0
A 0 0
A 1 0
A 1 1

Where 'Value1' = 1 and 'Value2' in the immediate below row = 1. Under these conditions both rows should be retained; any other rows corresponding to ID 'A' should not be retained. Can anyone help with this please? In this example the below output should be returned:

ID Value1 Value2
A 1 0
A 0 1
A 1 1
A 0 1
A 1 0
A 0 1

CodePudding user response:

The logic is keep all the rows where row before has Value1=1 and row immediately after has Value2=1. I've added a few rows to your data to check different scenarios.

df=structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A"), Value1 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), Value2 = c(0L, 
1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-9L))

  ID Value1 Value2
1  A      1      0
2  A      0      1
3  A      0      0
4  A      1      0
5  A      0      0
6  A      0      1
7  A      1      0
8  A      0      1
9  A      0      1

edit: your edit requires you to distinguish between 1's in Value1 and Value2 columns, there are probably a number of options available here, one option is to say that if Value=1 then this starts a new sequence, so the next row needs to have Value2=1 and Value1!=1.

tmp=which((df$Value1==1) c(tail(df$Value1!=1 & df$Value2==1,-1),NA)==2)
df[sort(c(tmp,tmp 1)),]

  ID Value1 Value2
1  A      1      0
2  A      0      1
7  A      1      0
8  A      0      1

note the row names/indices.

CodePudding user response:

  • You can try
library(dplyr)

inds <- df |> summarise(n = which(Value1 == 1 & c(Value2[2:n()] , 0) == 1))

df |> slice(unlist(Map(c , inds$n , inds$n   1)))
  • data
  ID Value1 Value2
1  A      1      0
2  A      0      1
3  A      1      0
4  A      0      1
  •  Tags:  
  • r
  • Related