Home > Blockchain >  R data.table join to find last row satisfying condition
R data.table join to find last row satisfying condition

Time:10-28

I'd like to get the last row satisfying a condition in a data.table.

For this I need to use a join because this is by far the fastest way.

mydt <- data.table(condition = c(F,F,T,F,F,T,F,F,F,F), row = 1:10).
> mydt
    condition   row
       <lgcl> <int>
 1:     FALSE     1
 2:     FALSE     2
 3:      TRUE     3
 4:     FALSE     4
 5:     FALSE     5
 6:      TRUE     6
 7:     FALSE     7
 8:     FALSE     8
 9:     FALSE     9
10:     FALSE    10

Basically I'd like the previous row where condition is TRUE. Here the sixth element should be 3, all the rest NA's.

I tried to compute this over both TRUE and FALSE because for some reason I can't use on = .(condition == T)

mydt[
  mydt,
  on = .(condition == condition, row < row),
  .(result = row),
  mult = "last"]$result

# [1]  1  2  3  4  5  6  7  8  9 10
# expected result: NA, 1, NA, 2, 4, 3, 5, 6, 7, 8, 9
# OR expected result (only for TRUE): NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA

Any help? thanks


EDIT The below accomplishes the expected result in dplyr but I am still after a data.table join solution

mydt %>% as.data.frame() %>% group_by(condition) %>% mutate(prev_result = lag(row))
   condition   row prev_result
   <lgl>     <int>       <int>
 1 FALSE         1          NA
 2 FALSE         2           1
 3 TRUE          3          NA
 4 FALSE         4           2
 5 FALSE         5           4
 6 TRUE          6           3
 7 FALSE         7           5
 8 FALSE         8           7
 9 FALSE         9           8
10 FALSE        10           9

CodePudding user response:

This is one way to reproduce the result with data.table

mydt[, prev_results := shift(row, n=1, type="lag"), by = condition]
mydt
    condition row prev_results
 1:     FALSE   1           NA
 2:     FALSE   2            1
 3:      TRUE   3           NA
 4:     FALSE   4            2
 5:     FALSE   5            4
 6:      TRUE   6            3
 7:     FALSE   7            5
 8:     FALSE   8            7
 9:     FALSE   9            8
10:     FALSE  10            9

CodePudding user response:

You’ll want to keep the x.row in the result. By default you seem to get the i.row instead.

library(data.table)

mydt <- data.table(condition = c(F, F, T, F, F, T, F, F, F, F), row = 1:10)

mydt[
  mydt,
  on = .(condition == condition, row < row),
  mult = "last",
  .(condition, i.row, x.row)
]
#>     condition i.row x.row
#>  1:     FALSE     1    NA
#>  2:     FALSE     2     1
#>  3:      TRUE     3    NA
#>  4:     FALSE     4     2
#>  5:     FALSE     5     4
#>  6:      TRUE     6     3
#>  7:     FALSE     7     5
#>  8:     FALSE     8     7
#>  9:     FALSE     9     8
#> 10:     FALSE    10     9

CodePudding user response:

mydt[mydt[condition == TRUE, which = TRUE][2], "prev_result"]  <- mydt[condition == TRUE, which = TRUE][1]
  • Related