I have a sample dataset that I would like to extract based on few conditions related to 0 and 1. this is the sample data.
df <- data.frame(ID=c(60,60,60,60,60,60,60,60,60,60,61,61,61,61,61,61,61,61,61,61), var=c("X1", "X2","X3", "X4", "X5","X6","X7","X8","X9","X10","X1", "X2","X3", "X4", "X5","X6","X7","X8","X9","X10"),
zero=c(0,0,NA,NA,NA,0,NA,0,NA,NA,NA,NA,NA,0,NA,0,NA,NA,NA,0), pos=c(NA,NA,1,1,NA,NA,1,NA,1,NA,1,NA,1,NA,1,NA,NA,1,1,NA))
ID | var | zero | pos |
---|---|---|---|
60 | X1 | 0 | NA |
60 | X2 | 0 | NA |
60 | X3 | NA | 1 |
60 | X4 | NA | 1 |
60 | X5 | NA | NA |
60 | X6 | 0 | NA |
60 | X7 | NA | 1 |
60 | X8 | 0 | NA |
60 | X9 | NA | 1 |
60 | X10 | NA | NA |
61 | X1 | NA | 1 |
In each ID (60,61), I wanted to extract rows based on variables in columns name 'zero' and 'pos'. the first condition is, I want to extract rows that have a first occurrence of 1 and a last occurrence of 1. secondly, I want to extract row that have zero before the first occurrence of 1. Thirdly, I want to extract row that zero occur after the first occurance of 1. Lastly, I want to extract the last row with either 0 or 1.
Thus, the resulting table would look like this.
df2 <- data.frame(ID=c(60,60,60,60,61,61,61,61), var=c("X2","X3","X6","X9","X1", "X4","X9","X10"),
zero=c(0,NA,0,NA,NA,0,NA,0), pos=c(NA,1,NA,1,1,NA,1,NA))
ID | var | zero | pos |
---|---|---|---|
60 | X2 | 0 | NA |
60 | X3 | NA | 1 |
60 | X6 | 0 | NA |
60 | X9 | NA | 1 |
61 | X1 | NA | 1 |
61 | X4 | 0 | NA |
61 | X9 | NA | 1 |
61 | X10 | 0 | NA |
I tried with rle() and match but I did not succeed. Please if anyone is familiar with extracting data like this. I would be really appreciated. Thank you in advance.
CodePudding user response:
You could create a small function that reflects your four conditions, and then apply that function by group
f <- function(z,p) {
p1 = which(p==1)
z0 = which(z==0)
c1 = c(p1[1],p1[length(p1)])
c2 = ifelse(z[p1[1]-1]==0, as.integer(p1[1]-1),as.integer(NA))
c3 = min(z0[which(z0>p1[1])], na.rm=T)
c4 = max(p1,z0, na.rm=T)
unique(c(c1,c2,c3,c4))
}
Now, apply that function by group
libary(dplyr)
df %>%
group_by(ID) %>%
filter(row_number() %in% f(zero,pos))
Output:
ID var zero pos
<dbl> <chr> <dbl> <dbl>
1 60 X2 0 NA
2 60 X3 NA 1
3 60 X6 0 NA
4 60 X9 NA 1
5 61 X1 NA 1
6 61 X4 0 NA
7 61 X9 NA 1
8 61 X10 0 NA
Or, using data.table
library(data.table)
setDT(df)[, .SD[f(zero,pos)], by=ID]
Output:
ID var zero pos
<num> <char> <num> <num>
1: 60 X3 NA 1
2: 60 X9 NA 1
3: 60 X2 0 NA
4: 60 X6 0 NA
5: 61 X1 NA 1
6: 61 X9 NA 1
7: 61 X4 0 NA
8: 61 X10 0 NA