Home > Enterprise >  How to embed an and operator inside a data.table function?
How to embed an and operator inside a data.table function?

Time:05-24

Suppose we start with this dataframe generated by the code immediately beneath:

> data1
   ID Period Values_1 Values_2 State
1   1      1        5        5    X0
2   1      2        0        2    X1
3   1      3        0        0    X2
4   1      4        0       12    X1
5   2      1        1        2    X0
6   2      2       -1        0    X2
7   2      3        0        1    X0
8   2      4        0        0    X0
9   3      1        0        0    X2
10  3      2        0        0    X1
11  3      3        0        0    X9
12  3      4        0        2    X3
13  4      1        1        4    X2
14  4      2        2        5    X1
15  4      3        3        6    X9
16  4      4        0        0    X3

data1 <- 
  data.frame(
    ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4),
    Period = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
    Values_1 = c(5, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0),
    Values_2 = c(5, 2, 0, 12, 2, 0, 1, 0, 0, 0, 0, 2, 4, 5, 6, 0),
    State = c("X0","X1","X2","X1","X0","X2","X0","X0", "X2","X1","X9","X3", "X2","X1","X9","X3")
  )

I've been using this data.table code to flag each ID with an "END" in State_1 when it no longer generates values in its future periods:

setDT(data1)[, State1 := ifelse(rev(cumsum(rev(Values_1   Values_2))), State, "END"), ID]

The above code gives these results:

> data1
    ID Period Values_1 Values_2 State State1
 1:  1      1        5        5    X0     X0
 2:  1      2        0        2    X1     X1
 3:  1      3        0        0    X2     X2
 4:  1      4        0       12    X1     X1
 5:  2      1        1        2    X0     X0
 6:  2      2       -1        0    X2    END
 7:  2      3        0        1    X0     X0
 8:  2      4        0        0    X0    END
 9:  3      1        0        0    X2     X2
10:  3      2        0        0    X1     X1
11:  3      3        0        0    X9     X9
12:  3      4        0        2    X3     X3
13:  4      1        1        4    X2     X2
14:  4      2        2        5    X1     X1
15:  4      3        3        6    X9     X9
16:  4      4        0        0    X3    END

When I would like to give these results for ID = 2 instead:

> data1
    ID Period Values_1 Values_2 State State1
 1:  1      1        5        5    X0     X0
 2:  1      2        0        2    X1     X1
 3:  1      3        0        0    X2     X2
 4:  1      4        0       12    X1     X1
 5:  2      1        1        2    X0     X0
 6:  2      2       -1        0    X2     X2
 7:  2      3        0        1    X0     X0
 8:  2      4        0        0    X0    END
 9:  3      1        0        0    X2     X2
10:  3      2        0        0    X1     X1
11:  3      3        0        0    X9     X9
12:  3      4        0        2    X3     X3
13:  4      1        1        4    X2     X2
14:  4      2        2        5    X1     X1
15:  4      3        3        6    X9     X9
16:  4      4        0        0    X3    END

In order to do this, I need to change the data.table code to in effect something like the below (it doesn't work), where if BOTH the future period values for an ID of Values_1 AND Values_2 (separately calculated) = 0, then the State_1 for that ID is flagged END for all of its future periods. How can this be done in data.table?

setDT(data1)[, State1 := ifelse(rev(cumsum(rev(Values_1))) & rev(cumsum(rev(Values_2))), State, "END"), ID]

This is linked with related post How to use dplyr or data.table to perform look-ahead calculations by groups of data subsets?

CodePudding user response:

Perhaps something like this:

f <- function(v1,v2,s) {
  s[cumsum(abs(v1) abs(v2))==0] <- "END"
  s
}

setDT(data1)[order(-Period), State1:=f(Values_1, Values_2, State), by=ID]

Output:

    ID Period Values_1 Values_2 State State1
 1:  1      1        5        5    X0     X0
 2:  1      2        0        2    X1     X1
 3:  1      3        0        0    X2     X2
 4:  1      4        0       12    X1     X1
 5:  2      1        1        2    X0     X0
 6:  2      2       -1        0    X2     X2
 7:  2      3        0        1    X0     X0
 8:  2      4        0        0    X0    END
 9:  3      1        0        0    X2     X2
10:  3      2        0        0    X1     X1
11:  3      3        0        0    X9     X9
12:  3      4        0        2    X3     X3
13:  4      1        1        4    X2     X2
14:  4      2        2        5    X1     X1
15:  4      3        3        6    X9     X9
16:  4      4        0        0    X3    END

CodePudding user response:

The answer in the linked post appears to be assuming non-negative values for Values_1 and Values_2. If there are negatives, insert an abs into the data.table expression:

setDT(data1)[, State1 := ifelse(rev(cumsum(rev(Values_1 | Values_2))), State, "END"), ID]
  • Related