I am working on an simple peak shaving algorithm and looking for the most optimized way of setting the remainder of column values to the next column if the value exceeds a certain threshold for a large time series.
Considering I have the following example dataset with certain threshold set for each threshold, the goal is to get a data.table where the values are capped off by their threshold and the remainder are added to the next column value (not exceeding their threshold) and so on to a certain window limit.
loads <- data.table(index = 1:3,
time1 = c(6600,3000, 12000),
time2 = c(12000, 4000, 2000),
time3 = c(0, 0, 0),
time4 = c(3000,12000,0),
time5 = c(5000, 2000, 3000),
time6 = c(0, 0, 0),
time7 = c(15000, 0, 0))
thresholds <- c("time1" = 5000,
"time2" = 5000,
"time3" = 5000,
"time4" = 12000,
"time5" = 12000,
"time6" = 12000,
"time7" = 5000)
With a window of 7 columns this should result in the following data.table:
res <- data.table(index = 1:3,
time1 = c(5000, 3000, 5000),
time2 = c(5000, 4000, 5000),
time3 = c(5000, 0, 4000),
time4 = c(6600, 12000, 0),
time5 = c(5000, 2000, 3000),
time6 = c(0, 0, 0),
time7 = c(5000, 0, 0))
I know there are some obvious ways to do this row-wise, but I am looking for a more vectorized/data.table approach to do this.
CodePudding user response:
I don't think this is easy (or even possible?) with "just" vectorized/data.table
-canonical code, but here's a straight-forward for
loop that does it as data.table
-efficiently (I think) as reasonable.
Up front: I add timeX
to both thresholds
(Inf
limit) and loads
(value of 0
) as a catch-all column so we know how much from the remainders of the row has been "lost". It's handy to have it for the for
loop, as well (though can be done without, with some code-rewrite).
library(data.table)
thresholds <- c("time1" = 5000,
"time2" = 5000,
"time3" = 5000,
"time4" = 12000,
"time5" = 12000,
"time6" = 12000,
"time7" = 5000,
"timeX" = Inf)
loads[, timeX := 0 ]
for (ind in seq_along(thresholds)) {
if (ind >= length(thresholds)) break
nm <- names(thresholds)[ind]
nm1 <- names(thresholds)[ind 1]
rmndr <- pmax(0, loads[[nm]] - thresholds[ind])
set(loads, i = NULL, j = nm, value = pmin(loads[[nm]], thresholds[ind]))
set(loads, i = NULL, j = nm1, value = loads[[nm1]] rmndr)
}
loads
# index time1 time2 time3 time4 time5 time6 time7 timeX
# <int> <num> <num> <num> <num> <num> <num> <num> <num>
# 1: 1 5000 5000 5000 6600 5000 0 5000 10000
# 2: 2 3000 4000 0 12000 2000 0 0 0
# 3: 3 5000 5000 4000 0 3000 0 0 0
Or if you really don't care about the discarded numbers, then
## using unmodified `loads` and `thresholds`
for (ind in seq_along(thresholds)) {
nm <- names(thresholds)[ind]
rmndr <- pmax(0, loads[[nm]] - thresholds[nm])
set(loads, i = NULL, j = nm, value = pmin(loads[[nm]], thresholds[nm]))
if (ind == length(thresholds)) break
nm1 <- names(thresholds)[ind 1]
set(loads, i = NULL, j = nm1, value = loads[[nm1]] rmndr)
}