Home > Enterprise >  Create counter of consecutive runs of a certain value and ignoring rows based on a condition
Create counter of consecutive runs of a certain value and ignoring rows based on a condition

Time:01-30

I have a dataset that looks like this:

# A tibble: 987 × 2
   time   counts
   <time> <chr> 
 1 07:33  1358  
 2 07:34  1072  
 3 07:35  112   
 4 07:36  316   
 5 07:37  0     
 6 07:38  16    
 7 07:39  32    
 8 07:40  0     
 9 07:41  0     
10 07:42  92    
# … with 977 more rows

I need to make a new variable that assigns a unique value to consecutive runs and restarts when counts == 0. Also, if the counter starts with a 0 and the next two consecutive rows have values <100, I want the counter to ignore those two rows and treat them as 0. The same goes for one row of counts >1000. So, if the following row of row with counts == 0 is greater than 1000, I want to treat it as a 0.

So, these are the conditions for the counter:

  • Restarts with 0

  • Ignores rows (treats them as 0) if immediately after the row where counts == 0, two consecutive rows have counts <100 or one row has counts >1000.

I have to do this because I need to delete the rows that have 60 or more consecutive rows with counts == 0, BUT these rows can include two consecutive rows with counts <100 or one with counts >1000.

I'm not sure if this makes sense. Hopefully, it does!

I have tried the following:

db %>% 
  mutate(consec_id = rleid(counts==0))

But I don't know how to tell R to ignore certain rows based on specific conditions.

I also tried using a loop, but I am very new to loops, so I didn't get what I wanted:

# Loop through the dataset and update the "wear_status" variable
for (i in 2:nrow(day1)) {
  # Check if current counts = 0
  if (db$counts[i] == 0) {
    consec_counts <- 0
    db$wear_status[i] <- 0
  } else {
    # Check if current counts are between 1 and 100
    if (db$counts[i] > 1 || db$counts[i] < 100) {
      consec_counts <- consec_counts   1
    } else {
      consec_counts <- 0
    }
    # Check if current counts > 100 or 3 consecutive rows of counts = 1 or 100
    if (db$counts[i] > 100 || consec_counts <= 2) {
      consec_counts <- 0
      db$wear_status[i] <- 1
    }
  }
}

This is my expected output:

# A tibble: 987 × 2
   time   counts  counter
   <time> <chr> 
 1 07:33  1358     1
 2 07:34  1072     1
 3 07:35  112      1  
 4 07:36  316      1
 5 07:37  0        2
 6 07:38  16       2  
 7 07:39  32       2  
 8 07:40  0        2
 9 07:41  0        2
10 07:42  92       3
11 07:43  80       3
12 07:44  78       3
13 07:45  0        4
# … with 977 more rows

Thank you!

CodePudding user response:

This is something to try without using a loop.

You have 3 conditions that would result in the counts value becoming zero:

  1. If the prior value is 0, the current is < 100, the following is < 100, and the one after that (2 values after the current value) is 0.
  2. If the prior to the prior value is 0, the prior value is < 100, the current value is < 100 and the following value is 0.
  3. If the previous value is 0, the current is greater than 1000, and the following value is 0.

You can use lag and lead to look at prior and following values. If you use a number, such as lag(counts, n = 2) that will consider 2 values away (n is the number of positions to lag or lead by).

library(tidyverse)
library(data.table)

df %>%
  mutate(new_counts = ifelse(
    (lag(counts) == 0 & counts < 100 & lead(counts) < 100 & lead(counts, 2) == 0) |
    (lag(counts, 2) == 0 & lag(counts) < 100 & counts < 100 & lead(counts) == 0) |
    (lag(counts) == 0 & counts > 1000 & lead(counts) == 0),
    0,
    counts
  )) %>%
  mutate(counter = rleid(new_counts == 0))

Output

    time counts new_counts counter
1  07:33   1358       1358       1
2  07:34   1072       1072       1
3  07:35    112        112       1
4  07:36    316        316       1
5  07:37      0          0       2
6  07:38     16          0       2
7  07:39     32          0       2
8  07:40      0          0       2
9  07:41      0          0       2
10 07:42     92         92       3
11 07:43     80         80       3
12 07:44     78         78       3
13 07:45      0          0       4

Data

df <- structure(list(time = c("07:33", "07:34", "07:35", "07:36", "07:37", 
"07:38", "07:39", "07:40", "07:41", "07:42", "07:43", "07:44", 
"07:45"), counts = c(1358L, 1072L, 112L, 316L, 0L, 16L, 32L, 
0L, 0L, 92L, 80L, 78L, 0L)), row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12", "13"), class = "data.frame")
  • Related