I'd like to get the last row satisfying a condition in a data.table
.
For this I need to use a join because this is by far the fastest way.
mydt <- data.table(condition = c(F,F,T,F,F,T,F,F,F,F), row = 1:10).
> mydt
condition row
<lgcl> <int>
1: FALSE 1
2: FALSE 2
3: TRUE 3
4: FALSE 4
5: FALSE 5
6: TRUE 6
7: FALSE 7
8: FALSE 8
9: FALSE 9
10: FALSE 10
Basically I'd like the previous row
where condition
is TRUE. Here the sixth element should be 3
, all the rest NA's.
I tried to compute this over both TRUE and FALSE because for some reason I can't use on = .(condition == T)
mydt[
mydt,
on = .(condition == condition, row < row),
.(result = row),
mult = "last"]$result
# [1] 1 2 3 4 5 6 7 8 9 10
# expected result: NA, 1, NA, 2, 4, 3, 5, 6, 7, 8, 9
# OR expected result (only for TRUE): NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA
Any help? thanks
EDIT The below accomplishes the expected result in dplyr
but I am still after a data.table
join
solution
mydt %>% as.data.frame() %>% group_by(condition) %>% mutate(prev_result = lag(row))
condition row prev_result
<lgl> <int> <int>
1 FALSE 1 NA
2 FALSE 2 1
3 TRUE 3 NA
4 FALSE 4 2
5 FALSE 5 4
6 TRUE 6 3
7 FALSE 7 5
8 FALSE 8 7
9 FALSE 9 8
10 FALSE 10 9
CodePudding user response:
This is one way to reproduce the result with data.table
mydt[, prev_results := shift(row, n=1, type="lag"), by = condition]
mydt
condition row prev_results
1: FALSE 1 NA
2: FALSE 2 1
3: TRUE 3 NA
4: FALSE 4 2
5: FALSE 5 4
6: TRUE 6 3
7: FALSE 7 5
8: FALSE 8 7
9: FALSE 9 8
10: FALSE 10 9
CodePudding user response:
You’ll want to keep the x.row
in the result. By default you seem to get
the i.row
instead.
library(data.table)
mydt <- data.table(condition = c(F, F, T, F, F, T, F, F, F, F), row = 1:10)
mydt[
mydt,
on = .(condition == condition, row < row),
mult = "last",
.(condition, i.row, x.row)
]
#> condition i.row x.row
#> 1: FALSE 1 NA
#> 2: FALSE 2 1
#> 3: TRUE 3 NA
#> 4: FALSE 4 2
#> 5: FALSE 5 4
#> 6: TRUE 6 3
#> 7: FALSE 7 5
#> 8: FALSE 8 7
#> 9: FALSE 9 8
#> 10: FALSE 10 9
CodePudding user response:
mydt[mydt[condition == TRUE, which = TRUE][2], "prev_result"] <- mydt[condition == TRUE, which = TRUE][1]