I am currently working with a data set that resembles the following:
date value ID
10/20/21 123 1
10/21/21 121 1
10/22/21 122 1
10/23/21 125 1
10/24/21 127 1
10/25/21 126 1
10/26/21 125 1
10/27/21 128 1
10/28/21 129 1
10/20/21 155 2
10/21/21 156 2
10/22/21 152 2
10/23/21 154 2
10/24/21 157 2
10/25/21 158 2
10/26/21 159 2
10/27/21 160 2
10/28/21 162 2
The value column should always be increasing in value as time goes forward. If it does not (or rather if the value is less than the value prior to itself) then that value should be replaced with an NA.
The result I'm trying to achieve would look like this:
date value ID
10/20/21 123 1
10/21/21 NA 1
10/22/21 NA 1
10/23/21 125 1
10/24/21 127 1
10/25/21 NA 1
10/26/21 NA 1
10/27/21 128 1
10/28/21 129 1
10/20/21 155 2
10/21/21 156 2
10/22/21 NA 2
10/23/21 NA 2
10/24/21 157 2
10/25/21 158 2
10/26/21 159 2
10/27/21 160 2
10/28/21 162 2
I have tried solving this using a simple lag() function, but this only works for the previous value of each row and doesn't work when more than one number in succession needs to be replaced with an NA.
Is there an efficient way to achieve this result?
CodePudding user response:
Using dd from the Note at the end, compare the current value to the cumulative maximum:
library(data.table)
DT <- as.data.table(dd)
DT[, value2 := ifelse(cummax(value) > value, NA, value)]
DT
giving:
date value ID value2
1: 10/20/21 123 1 123
2: 10/21/21 121 1 NA
3: 10/22/21 122 1 NA
4: 10/23/21 125 1 125
5: 10/24/21 127 1 127
6: 10/25/21 126 1 NA
7: 10/26/21 125 1 NA
8: 10/27/21 128 1 128
9: 10/28/21 129 1 129
10: 10/20/21 155 2 155
11: 10/21/21 156 2 156
12: 10/22/21 152 2 NA
13: 10/23/21 154 2 NA
14: 10/24/21 157 2 157
15: 10/25/21 158 2 158
16: 10/26/21 159 2 159
17: 10/27/21 160 2 160
18: 10/28/21 162 2 162
Note
The input in reproducible form is:
dd <-
structure(list(date = c("10/20/21", "10/21/21", "10/22/21", "10/23/21",
"10/24/21", "10/25/21", "10/26/21", "10/27/21", "10/28/21", "10/20/21",
"10/21/21", "10/22/21", "10/23/21", "10/24/21", "10/25/21", "10/26/21",
"10/27/21", "10/28/21"), value = c(123L, 121L, 122L, 125L, 127L,
126L, 125L, 128L, 129L, 155L, 156L, 152L, 154L, 157L, 158L, 159L,
160L, 162L), ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-18L))
CodePudding user response:
You can apply cummax
without data.table
df$value2 = with(df, ifelse(cummax(value) > value, NA, value))