I am in need of a conditional way to lag back to the last row where the value is one number or "level" lower than the current row. Whenever type = "yes", I want to go back one level lower to the last "no" and get the quantity. For example, rows 2 and 3 here are type "yes" and level 5. In that case, I'd like to go back to the last level 4 "no" row, get the quantity, and assign it to a new column. When type is "no" no lagging needs to be done.
Data:
row_id level type quantity
1 4 no 100
2 5 yes 110
3 5 yes 115
4 2 no 500
5 2 no 375
6 3 yes 250
7 3 yes 260
8 3 yes 420
Desired output:
row_id level type quantity lagged_quantity
1 4 no 100 NA
2 5 yes 110 100
3 5 yes 115 100
4 2 no 500 NA
5 2 no 375 NA
6 3 yes 250 375
7 3 yes 260 375
8 3 yes 420 375
Data:
structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4,
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no",
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250,
260, 420)), row.names = c(NA, -8L), class = c("tbl_df", "tbl",
"data.frame"))
Desired output:
structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4,
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no",
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250,
260, 420), lagged_quantity = c("NA", "100", "100", "NA", "NA",
"375", "375", "375")), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
@Mossa
CodePudding user response:
Direct solution would be to:
df1 %>%
mutate(
level_id = 1 cumsum(c(1, diff(level)) < 0)
) %>%
mutate(lagged_quantity = if_else(type == "yes", NA_real_, quantity)) %>%
fill(lagged_quantity) %>%
mutate(lagged_quantity = if_else(type == "no", NA_real_, lagged_quantity))
Where first we retain only the values you would like, and then the missing entries are filled with last known value, and then the no
answers, that need not be lagged, are taken out.
CodePudding user response:
An option with data.table
library(data.table)
setDT(df1)[df1[, .(lagged_qty = last(quantity)), .(level, type)][,
lagged_qty := shift(lagged_qty), .(grp = cumsum(type == 'no'))],
lagged_qty := lagged_qty, on = .(level, type)]
-output
> df1
row_id level type quantity lagged_qty
<int> <int> <char> <int> <int>
1: 1 4 no 100 NA
2: 2 5 yes 110 100
3: 3 5 yes 115 100
4: 4 2 no 500 NA
5: 5 2 no 375 NA
6: 6 3 yes 250 375
7: 7 3 yes 260 375
8: 8 3 yes 420 375