Home > Mobile >  Loop to capture differences greater than threshold in R
Loop to capture differences greater than threshold in R

Time:12-21

I have a dataset formatted as following:

person_ID  exam_ID value_1  number_studies
A1         1A1     2        3
A1         2A1     3        3
A1         3A1     1        3
A2         1A2     2        5
A2         2A2     3        5
A2         3A2     3.5      5
A2         4A2     1.5      5
A2         5A2     1.0      5

The data is ordered by person_ID and then by exam_ID. I would like to remove any rows following and including the first row with a difference between value_1 of less then -1.

For example, for person_ID 'A1', I would keep exam_IDs '1A1' and '2A1', but remove '3A1' as the difference between value_1 for '3A1-2A1' is < -1. For person_ID 'A2', I would remove exam_IDs 4A2 and 5A2.

I thought to do this with nested while loops to create a list of exam_IDs and then subset my dataframe, but the code does not work. See example below. I would appreciate any advice/suggestions!

z1 <- list()
for(person in unique(df$person_ID)) {
tempdata <- subset(df, df$person_ID == person)
t1 <- seq(from = 1, to = (unique(tempdata$number_studies)-1))
i <- 0
t <- 1
while(t < (unique(tempdata$number_studies)-1)){
   while(i>-1){
     i <- tempdata[t   1,3] - tempdata[t,3]
     tempID <- tempdata[t,]
     z1 <- append(z1, tempID$exam_ID)
     t <- t 1
   }
 }
}

CodePudding user response:

You don't need a loop for this. Here's a solution using data.table

library(data.table)
setDT(dat)
dat[ , drop:=cumsum(c(0,diff(value_1))< -1), by=person_ID][drop==0, !"drop"]


   person_ID exam_ID value_1 number_studies
1:        A1     1A1     2.0              3
2:        A1     2A1     3.0              3
3:        A2     1A2     2.0              5
4:        A2     2A2     3.0              5
5:        A2     3A2     3.5              5

To understand how it works, a variable called drop is created which incrementally counts the number of values for which the difference between subsequent values is -1 or lower. This is stratified by person_ID. Then only the rows where drop is 0 are returned, and drop itself is dropped.

  • Related