Home > Enterprise >  Extract rows using multiple conditions related to the order of occurrence of zero and one in R rule(
Extract rows using multiple conditions related to the order of occurrence of zero and one in R rule(

Time:05-31

I have a sample dataset that I would like to extract based on few conditions related to 0 and 1. this is the sample data.

df <- data.frame(ID=c(60,60,60,60,60,60,60,60,60,60,61,61,61,61,61,61,61,61,61,61), var=c("X1", "X2","X3", "X4", "X5","X6","X7","X8","X9","X10","X1", "X2","X3", "X4", "X5","X6","X7","X8","X9","X10"), 
                 zero=c(0,0,NA,NA,NA,0,NA,0,NA,NA,NA,NA,NA,0,NA,0,NA,NA,NA,0), pos=c(NA,NA,1,1,NA,NA,1,NA,1,NA,1,NA,1,NA,1,NA,NA,1,1,NA))
ID var zero pos
60 X1 0 NA
60 X2 0 NA
60 X3 NA 1
60 X4 NA 1
60 X5 NA NA
60 X6 0 NA
60 X7 NA 1
60 X8 0 NA
60 X9 NA 1
60 X10 NA NA
61 X1 NA 1

In each ID (60,61), I wanted to extract rows based on variables in columns name 'zero' and 'pos'. the first condition is, I want to extract rows that have a first occurrence of 1 and a last occurrence of 1. secondly, I want to extract row that have zero before the first occurrence of 1. Thirdly, I want to extract row that zero occur after the first occurance of 1. Lastly, I want to extract the last row with either 0 or 1.

Thus, the resulting table would look like this.

df2 <- data.frame(ID=c(60,60,60,60,61,61,61,61), var=c("X2","X3","X6","X9","X1", "X4","X9","X10"), 
                 zero=c(0,NA,0,NA,NA,0,NA,0), pos=c(NA,1,NA,1,1,NA,1,NA))
ID var zero pos
60 X2 0 NA
60 X3 NA 1
60 X6 0 NA
60 X9 NA 1
61 X1 NA 1
61 X4 0 NA
61 X9 NA 1
61 X10 0 NA

I tried with rle() and match but I did not succeed. Please if anyone is familiar with extracting data like this. I would be really appreciated. Thank you in advance.

CodePudding user response:

You could create a small function that reflects your four conditions, and then apply that function by group

f <- function(z,p) {
  p1 = which(p==1)
  z0 = which(z==0)
  c1 = c(p1[1],p1[length(p1)])
  c2 = ifelse(z[p1[1]-1]==0, as.integer(p1[1]-1),as.integer(NA))
  c3 = min(z0[which(z0>p1[1])], na.rm=T)
  c4 = max(p1,z0, na.rm=T)
  unique(c(c1,c2,c3,c4))
}

Now, apply that function by group

libary(dplyr)
df %>%
  group_by(ID) %>%
  filter(row_number() %in% f(zero,pos))

Output:

     ID var    zero   pos
  <dbl> <chr> <dbl> <dbl>
1    60 X2        0    NA
2    60 X3       NA     1
3    60 X6        0    NA
4    60 X9       NA     1
5    61 X1       NA     1
6    61 X4        0    NA
7    61 X9       NA     1
8    61 X10       0    NA

Or, using data.table

library(data.table)
setDT(df)[, .SD[f(zero,pos)], by=ID]

Output:

      ID    var  zero   pos
   <num> <char> <num> <num>
1:    60     X3    NA     1
2:    60     X9    NA     1
3:    60     X2     0    NA
4:    60     X6     0    NA
5:    61     X1    NA     1
6:    61     X9    NA     1
7:    61     X4     0    NA
8:    61    X10     0    NA
  • Related