Home > Blockchain >  Lag back to a Value with Conditions R
Lag back to a Value with Conditions R

Time:05-26

I am in need of a conditional way to lag back to the last row where the value is one number or "level" lower than the current row. Whenever type = "yes", I want to go back one level lower to the last "no" and get the quantity. For example, rows 2 and 3 here are type "yes" and level 5. In that case, I'd like to go back to the last level 4 "no" row, get the quantity, and assign it to a new column. When type is "no" no lagging needs to be done.

Data:

row_id  level  type   quantity
1       4      no     100
2       5      yes    110
3       5      yes    115
4       2      no     500  
5       2      no     375
6       3      yes    250
7       3      yes    260
8       3      yes    420


Desired output:

row_id  level  type  quantity lagged_quantity
1       4      no    100      NA
2       5      yes   110      100
3       5      yes   115      100
4       2      no    500      NA
5       2      no    375      NA
6       3      yes   250      375
7       3      yes   260      375
8       3      yes   420      375

Data:

structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4, 
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no", 
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250, 
260, 420)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", 
"data.frame"))


Desired output:

structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4, 
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no", 
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250, 
260, 420), lagged_quantity = c("NA", "100", "100", "NA", "NA", 
"375", "375", "375")), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

@Mossa

CodePudding user response:

Direct solution would be to:

df1 %>% 
  mutate(
    level_id = 1   cumsum(c(1, diff(level)) < 0)
  ) %>%
  mutate(lagged_quantity = if_else(type == "yes", NA_real_, quantity)) %>% 
  fill(lagged_quantity) %>% 
  mutate(lagged_quantity = if_else(type == "no", NA_real_, lagged_quantity))

Where first we retain only the values you would like, and then the missing entries are filled with last known value, and then the no answers, that need not be lagged, are taken out.

CodePudding user response:

An option with data.table

library(data.table)
setDT(df1)[df1[, .(lagged_qty = last(quantity)), .(level, type)][,
   lagged_qty := shift(lagged_qty), .(grp = cumsum(type == 'no'))], 
    lagged_qty := lagged_qty, on = .(level, type)]

-output

> df1
   row_id level   type quantity lagged_qty
    <int> <int> <char>    <int>      <int>
1:      1     4     no      100         NA
2:      2     5    yes      110        100
3:      3     5    yes      115        100
4:      4     2     no      500         NA
5:      5     2     no      375         NA
6:      6     3    yes      250        375
7:      7     3    yes      260        375
8:      8     3    yes      420        375
  • Related