I have the following problem:
dt = data.table(check=c("H", "L","L","L","H","L","L","L"),
old = c(8, 3, 2.8, 2.7, 2.3, 2.3, 1.7, 1.4),
new = c(19, 2.8, 2.7, 2.3, 21, 1.7, 1.4, 1.1),
beg_low = rep(3.0, 8),
beg_high = rep(8, 8),
band_low = rep(0, 8),
band_high = rep(0, 8))
> dt
check old new beg_low beg_high band_low band_high
1: H 8.0 19.0 3 8 0 0
2: L 3.0 2.8 3 8 0 0
3: L 2.8 2.7 3 8 0 0
4: L 2.7 2.3 3 8 0 0
5: H 2.3 21.0 3 8 0 0
6: L 2.3 1.7 3 8 0 0
7: L 1.7 1.4 3 8 0 0
8: L 1.4 1.1 3 8 0 0
Here we need to update the values of band_low and band_high in the following manner:
The fields band_low and band_high need to be filled using the values of the fields check, new, beg_low and beg_high.
band_low and band_high take their initial values from beg_low and beg_high.
If check is "H", then band_low would remain unchanged, i.e., it would be equal to beg_low. band_high, on the other hand, would be revised, using the field new. In the data we have, band_high would now become 19.
If check is "L", on the other hand, then band_high would be unchanged, and would retain its previous value. band_low would be revised, and would now become 2.8, in our data. Subsequent values of check are all "L", hence band_low keeps getting revised using new, but band_high remains unchanged.
So the answer would be:
dt.ans = data.table(check=c("H", "L","L","L","H","L","L","L"),
old = c(8, 3, 2.8, 2.7, 2.3, 2.3, 1.7, 1.4),
new = c(19, 2.8, 2.7, 2.3, 21, 1.7, 1.4, 1.1),
beg_low = rep(3.0, 8),
beg_high = rep(8, 8),
band_low = c(3,2.8, 2.7, 2.3, 2.3, 1.7, 1.4, 1.1),
band_high = c(19, 19, 19, 19, 21, 21, 21,21))
> dt.ans
check old new beg_low beg_high band_low band_high
1: H 8.0 19.0 3 8 3.0 19
2: L 3.0 2.8 3 8 2.8 19
3: L 2.8 2.7 3 8 2.7 19
4: L 2.7 2.3 3 8 2.3 19
5: H 2.3 21.0 3 8 2.3 21
6: L 2.3 1.7 3 8 1.7 21
7: L 1.7 1.4 3 8 1.4 21
8: L 1.4 1.1 3 8 1.1 21
I tried to solve this using data.table:
dt[, ':=' (band_low = beg_low,
band_high = beg_high)]
dt[, band_low := ifelse(check=="L", new, shift(band_low)), 1:nrow(dt)]
dt[, band_high := ifelse(check=="L", shift(band_high), new), 1:nrow(dt)]
dt[, band_high := ifelse(check=="H", new, shift(band_high)), 1:nrow(dt)]
dt[, band_low := ifelse(check=="H", shift(band_low), new), 1:nrow(dt)]
> dt
check old new beg_low beg_high band_low band_high
1: H 8.0 19.0 3 8 NA 19
2: L 3.0 2.8 3 8 2.8 NA
3: L 2.8 2.7 3 8 2.7 NA
4: L 2.7 2.3 3 8 2.3 NA
5: H 2.3 21.0 3 8 NA 21
6: L 2.3 1.7 3 8 1.7 NA
7: L 1.7 1.4 3 8 1.4 NA
8: L 1.4 1.1 3 8 1.1 NA
Somehow the previous values of the variables are being missed, with the result being NAs.
What am I missing here please? Solving the problem using if clauses in a data.frame manner is possible, but very slow, and not scalable.
Any help would be greatly appreciated.
CodePudding user response:
You could use nafill
with locf
(last occurence carried forward) option :
dt[,band_high:=NA]
dt[,band_low:=NA]
dt[,band_high:=fifelse(check=='H',new,band_high)]
dt[,band_low:=fifelse(check=='L',new,band_low)]
dt[,band_high:=nafill(band_high,'locf')]
dt[,band_low:=nafill(band_low,'locf')]
dt[is.na(band_low),band_low:=beg_low]
dt[is.na(band_high),band_high:=beg_high]
all.equal(dt,dt.ans)
#[1] TRUE