I have the below df:
df <- data.table(user = c('a', 'a', 'a', 'b', 'b')
, spend = 1:5
, shift_by = c(1,1,2,1,1)
); df
user spend shift_by
1: a 1 1
2: a 2 1
3: a 3 2
4: b 4 1
5: b 5 1
and am looking to create a lead lag column only this time the n
parameter in data.table
's shift
function is dynamic and takes df$shiftby
as input. My expected result is:
df[, spend_shifted := c(NA, 1, 1, NA, 4)]; df
user spend shift_by spend_shifted
1: a 1 1 NA
2: a 2 1 1
3: a 3 2 1
4: b 4 1 NA
5: b 5 1 4
However, with the below attempt it gives:
df[, spend_shifted := shift(x=spend, n=shift_by, type="lag"), x]; df
user spend shift_by spend_shifted
1: a 1 1 NA
2: a 2 1 NA
3: a 3 2 NA
4: b 4 1 NA
5: b 5 1 NA
This is the closest example I could find.
However, I need a group by and am after a data.table
solution because of speed.
Truly look forward to finding any ideas. Thank you
CodePudding user response:
I believe this will work. You can drop the newindex-column afterward.
df[, newindex := rowid(user) - shift_by]
df[newindex > 0, spend_shifted := df[, spend[newindex], by = .(user)]$V1]
# user spend shift_by newindex spend_shifted
# 1: a 1 1 0 NA
# 2: a 2 1 1 1
# 3: a 3 2 1 1
# 4: b 4 1 0 NA
# 5: b 5 1 1 4
CodePudding user response:
Using matrix subsetting of data.frames:
df[,
spend_shifted :=
data.frame(shift(spend, n = unique(sort(shift_by))))[cbind(1:.N, shift_by)],
by = user]
Another solution (in addition to Wimpel's) without shift:
df[, {rows <- 1:nrow(.SD) - shift_by; .SD[replace(rows, rows == 0, NA), spend]},
by = user]
CodePudding user response:
Here's another approach, using a data.table join. I use two helper-columns to join on:
df[, row := .I, by = .(user)]
df[, match_row := row - shift_by]
df[df, on = .(user, match_row = row), x := i.spend]
df[, c('row', 'match_row') := NULL]
# user spend shift_by spend_shifted x
# 1: a 1 1 NA NA
# 2: a 2 1 1 1
# 3: a 3 2 1 1
# 4: b 4 1 NA NA
# 5: b 5 1 4 4
CodePudding user response:
Another thought:
df[, spend_shifted := spend[{o <- seq_len(.N) - shift_by; o[o<1] <- NA; o; }], by = user]
# user spend shift_by spend_shifted
# <char> <int> <num> <int>
# 1: a 1 1 NA
# 2: a 2 1 1
# 3: a 3 2 1
# 4: b 4 1 NA
# 5: b 5 1 4
I don't generally like {..}
code inside my data.table::[
blocks, but it's not too long here. It could be replaced by a simple udf:
func <- function(i, j) { o <- seq_len(i) - j; o[o < 1] <- NA; o; }
df[, spend_shifted := spend[func(.N, shift_by)], by = user]
CodePudding user response:
Maybe this could help
> df[, spend_shifted := spend[replace(seq(.N) - shift_by, seq(.N) == shift_by, NA)], user][]
user spend shift_by spend_shifted
1: a 1 1 NA
2: a 2 1 1
3: a 3 2 1
4: b 4 1 NA
5: b 5 1 4