Home > Back-end >  R time series data: removing values from column that are less than previous value
R time series data: removing values from column that are less than previous value

Time:10-27

I am currently working with a data set that resembles the following:

    date         value     ID
    10/20/21     123       1
    10/21/21     121       1
    10/22/21     122       1
    10/23/21     125       1
    10/24/21     127       1
    10/25/21     126       1
    10/26/21     125       1
    10/27/21     128       1
    10/28/21     129       1
    10/20/21     155       2
    10/21/21     156       2
    10/22/21     152       2
    10/23/21     154       2
    10/24/21     157       2
    10/25/21     158       2
    10/26/21     159       2
    10/27/21     160       2
    10/28/21     162       2

The value column should always be increasing in value as time goes forward. If it does not (or rather if the value is less than the value prior to itself) then that value should be replaced with an NA.

The result I'm trying to achieve would look like this:

    date         value     ID
    10/20/21     123       1
    10/21/21     NA        1
    10/22/21     NA        1
    10/23/21     125       1
    10/24/21     127       1
    10/25/21     NA        1
    10/26/21     NA        1
    10/27/21     128       1
    10/28/21     129       1
    10/20/21     155       2
    10/21/21     156       2
    10/22/21     NA        2
    10/23/21     NA        2
    10/24/21     157       2
    10/25/21     158       2
    10/26/21     159       2
    10/27/21     160       2
    10/28/21     162       2

I have tried solving this using a simple lag() function, but this only works for the previous value of each row and doesn't work when more than one number in succession needs to be replaced with an NA.

Is there an efficient way to achieve this result?

CodePudding user response:

Using dd from the Note at the end, compare the current value to the cumulative maximum:

library(data.table)

DT <- as.data.table(dd)
DT[, value2 := ifelse(cummax(value) > value, NA, value)]
DT

giving:

        date value ID value2
 1: 10/20/21   123  1    123
 2: 10/21/21   121  1     NA
 3: 10/22/21   122  1     NA
 4: 10/23/21   125  1    125
 5: 10/24/21   127  1    127
 6: 10/25/21   126  1     NA
 7: 10/26/21   125  1     NA
 8: 10/27/21   128  1    128
 9: 10/28/21   129  1    129
10: 10/20/21   155  2    155
11: 10/21/21   156  2    156
12: 10/22/21   152  2     NA
13: 10/23/21   154  2     NA
14: 10/24/21   157  2    157
15: 10/25/21   158  2    158
16: 10/26/21   159  2    159
17: 10/27/21   160  2    160
18: 10/28/21   162  2    162

Note

The input in reproducible form is:

dd <- 
structure(list(date = c("10/20/21", "10/21/21", "10/22/21", "10/23/21", 
"10/24/21", "10/25/21", "10/26/21", "10/27/21", "10/28/21", "10/20/21", 
"10/21/21", "10/22/21", "10/23/21", "10/24/21", "10/25/21", "10/26/21", 
"10/27/21", "10/28/21"), value = c(123L, 121L, 122L, 125L, 127L, 
126L, 125L, 128L, 129L, 155L, 156L, 152L, 154L, 157L, 158L, 159L, 
160L, 162L), ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-18L))

CodePudding user response:

You can apply cummax without data.table

df$value2 = with(df, ifelse(cummax(value) > value, NA, value))
  • Related