Home > OS >  How to match one row from one column to the next 5-10 rows in two other columns in R?
How to match one row from one column to the next 5-10 rows in two other columns in R?

Time:08-17

I have a data frame which looks like this:

df1 <- structure(list(day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20), observ1 = c(1, 0, 0, 0, 0, 1, 
 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), observ2 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1), 
observ3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)), 
 class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))

Previously I got a TRUE value if observ1 equals 1 and after 5 to 10 days, observ2 also equals 1.

Now, I need to add a 3rd condition that if observ1 equals 1, and after 5-10 days, observ2 equals 1 AND also observ3 equals 1 within the same 5-10 days, then retrun TRUE.

So, the new 'check' column should look like this:

df1 <- structure(list(day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), 
observ1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), 
observ2 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1), 
observ3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0), 
check = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'TRUE', 0, 0, 0, 0, 0, 0)), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))

CodePudding user response:

Hopefully this helps, thanks for asking another question, this is generally considered the way to go when you need to add on to your original question btw. Im not sure this is correct, can you please give me guidance on whether or not this is what you are after ?

df1$check <- with(
  df1, 
  vapply(
    seq_along(observ1),
    function(i){
      # If we are less than five days in: 
      if(i - 5 <= 0){
        # Return NA: logical scalar => env
        NA
      # Otherwise:
      }else{
        # Ensure no negative indices by setting a lower bound of 1: 
        # idx_lower_bound => integer scalar
        idx_lower_bound <- max(
          i-10, 
          1
        )
        # Compute the index: idx => integer vector
        idx <- seq(
          idx_lower_bound,
          i 5,
          by = 1
        )
        # Test if all conditions are true: 
        # check => logical scalar
        check <- all(
          # The current value of observ2 == 1 ? logical scalar
          observ1[i] == 1,
          # Any observ2 values in the range == 1 ? logical scalar
          any(observ2[idx] == 1),
          # Any observ3 values in the range == 1 ? logical scalar
          any(observ3[idx] == 1)
        )
        # Replace false with NA: logical vector => env
        ifelse(
          check, 
          check, 
          NA
        )
      }
    },
    logical(1)
  )
)

Data:

df1 <- structure(
  list(
    day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), 
    observ1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), 
    observ2 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1), 
    observ3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
    ),
  class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L)
)
  • Related