Shift with dynamic n (number of position lead / lag by)-CodePudding

I have the below df:

df <- data.table(user = c('a', 'a', 'a', 'b', 'b')
                 , spend = 1:5
                 , shift_by = c(1,1,2,1,1)
                 ); df

   user spend shift_by
1:    a     1        1
2:    a     2        1
3:    a     3        2
4:    b     4        1
5:    b     5        1

and am looking to create a lead lag column only this time the n parameter in data.table's shift function is dynamic and takes df$shiftby as input. My expected result is:

df[, spend_shifted := c(NA, 1, 1, NA, 4)]; df

   user spend shift_by spend_shifted
1:    a     1        1            NA
2:    a     2        1             1
3:    a     3        2             1
4:    b     4        1            NA
5:    b     5        1             4

However, with the below attempt it gives:

df[, spend_shifted := shift(x=spend, n=shift_by, type="lag"), x]; df

   user spend shift_by spend_shifted
1:    a     1        1            NA
2:    a     2        1            NA
3:    a     3        2            NA
4:    b     4        1            NA
5:    b     5        1            NA

This is the closest example I could find. However, I need a group by and am after a data.table solution because of speed. Truly look forward to finding any ideas. Thank you

CodePudding user response：

I believe this will work. You can drop the newindex-column afterward.

df[, newindex := rowid(user) - shift_by]
df[newindex > 0, spend_shifted := df[, spend[newindex], by = .(user)]$V1]
#    user spend shift_by newindex spend_shifted
# 1:    a     1        1        0            NA
# 2:    a     2        1        1             1
# 3:    a     3        2        1             1
# 4:    b     4        1        0            NA
# 5:    b     5        1        1             4

CodePudding user response：

Using matrix subsetting of data.frames:

df[, 
   spend_shifted := 
     data.frame(shift(spend, n = unique(sort(shift_by))))[cbind(1:.N, shift_by)], 
   by = user]

Another solution (in addition to Wimpel's) without shift:

df[, {rows <- 1:nrow(.SD) - shift_by; .SD[replace(rows, rows == 0, NA), spend]}, 
   by = user]

CodePudding user response：

Here's another approach, using a data.table join. I use two helper-columns to join on:

df[, row := .I, by = .(user)]
df[, match_row := row - shift_by]
df[df, on = .(user, match_row = row), x := i.spend]
df[, c('row', 'match_row') := NULL]


#    user spend shift_by spend_shifted  x
# 1:    a     1        1            NA NA
# 2:    a     2        1             1  1
# 3:    a     3        2             1  1
# 4:    b     4        1            NA NA
# 5:    b     5        1             4  4

CodePudding user response：

Another thought:

df[, spend_shifted := spend[{o <- seq_len(.N) - shift_by; o[o<1] <- NA; o; }], by = user]
#      user spend shift_by spend_shifted
#    <char> <int>    <num>         <int>
# 1:      a     1        1            NA
# 2:      a     2        1             1
# 3:      a     3        2             1
# 4:      b     4        1            NA
# 5:      b     5        1             4

I don't generally like {..} code inside my data.table::[ blocks, but it's not too long here. It could be replaced by a simple udf:

func <- function(i, j) { o <- seq_len(i) - j; o[o < 1] <- NA; o; }
df[, spend_shifted := spend[func(.N, shift_by)], by = user]

CodePudding user response：

Maybe this could help

> df[, spend_shifted := spend[replace(seq(.N) - shift_by, seq(.N) == shift_by, NA)], user][]
   user spend shift_by spend_shifted
1:    a     1        1            NA
2:    a     2        1             1
3:    a     3        2             1
4:    b     4        1            NA
5:    b     5        1             4