I have a data.table as so:
library(data.table)
dt.tst <- CJ(Type = c("A", "B"),
Range_val = seq(0,20000, by = 1000))
dt.tst[Range_val == 2000 & Type == "A", Value := 0.987]
dt.tst[Range_val == 2000 & Type == "B", Value := 1.987]
dt.tst[Range_val == 9000 & Type == "A", Value := 1.056]
dt.tst[Range_val == 9000 & Type == "B", Value := 2.138]
dt.tst[Range_val == 16000 & Type == "A", Value := 1.563]
dt.tst[Range_val == 16000 & Type == "B", Value := 2.089]
I would like to fill the NA:s in the Value-column:
- With the closest non-na-value by the Range_val-column and Type.
- In case of a tie the highest Range_val and it's corresponding Value should be used (but this is not a deal-breaker, speed is more important).
I could do this pretty simple but slow with a for-loop. So I would like a more clean way of doing this (and faster). What would be a good data.table-way of doing this?
There will always be non-na-values but the intervals might differ.
CodePudding user response:
dt.tst[is.na(Value), Value := dt.tst[!is.na(Value)][dt.tst[is.na(Value)], roll = "nearest", on = .(Type, Range_val)]$Value]
output
dt.tst
# Type Range_val Value
# 1: A 0 0.987
# 2: A 1000 0.987
# 3: A 2000 0.987
# 4: A 3000 0.987
# 5: A 4000 0.987
# 6: A 5000 0.987
# 7: A 6000 1.056
# 8: A 7000 1.056
# 9: A 8000 1.056
# 10: A 9000 1.056
# 11: A 10000 1.056
# 12: A 11000 1.056
# 13: A 12000 1.056
# 14: A 13000 1.563
# 15: A 14000 1.563
# 16: A 15000 1.563
# 17: A 16000 1.563
# 18: A 17000 1.563
# 19: A 18000 1.563
# 20: A 19000 1.563
# 21: A 20000 1.563
# 22: B 0 1.987
# 23: B 1000 1.987
# 24: B 2000 1.987
# 25: B 3000 1.987
# 26: B 4000 1.987
# 27: B 5000 1.987
# 28: B 6000 2.138
# 29: B 7000 2.138
# 30: B 8000 2.138
# 31: B 9000 2.138
# 32: B 10000 2.138
# 33: B 11000 2.138
# 34: B 12000 2.138
# 35: B 13000 2.089
# 36: B 14000 2.089
# 37: B 15000 2.089
# 38: B 16000 2.089
# 39: B 17000 2.089
# 40: B 18000 2.089
# 41: B 19000 2.089
# 42: B 20000 2.089
CodePudding user response:
If I understand your goal correctly, i,e., assigning a NA
in the Value
column with its nearest non-NA
value, probably you can try max.col
outer
like below
dt.tst[
,
Value2 := replace(
Value,
is.na(Value),
na.omit(Value)[max.col(-abs(outer(which(is.na(Value)), which(!is.na(Value)), `-`)), "last")]
), Type
][]
which gives
Type Range_val Value Value2
1: A 0 NA 0.987
2: A 1000 NA 0.987
3: A 2000 0.987 0.987
4: A 3000 NA 0.987
5: A 4000 NA 0.987
6: A 5000 NA 0.987
7: A 6000 NA 1.056
8: A 7000 NA 1.056
9: A 8000 NA 1.056
10: A 9000 1.056 1.056
11: A 10000 NA 1.056
12: A 11000 NA 1.056
13: A 12000 NA 1.056
14: A 13000 NA 1.563
15: A 14000 NA 1.563
16: A 15000 NA 1.563
17: A 16000 1.563 1.563
18: A 17000 NA 1.563
19: A 18000 NA 1.563
20: A 19000 NA 1.563
21: A 20000 NA 1.563
22: B 0 NA 1.987
23: B 1000 NA 1.987
24: B 2000 1.987 1.987
25: B 3000 NA 1.987
26: B 4000 NA 1.987
27: B 5000 NA 1.987
28: B 6000 NA 2.138
29: B 7000 NA 2.138
30: B 8000 NA 2.138
31: B 9000 2.138 2.138
32: B 10000 NA 2.138
33: B 11000 NA 2.138
34: B 12000 NA 2.138
35: B 13000 NA 2.089
36: B 14000 NA 2.089
37: B 15000 NA 2.089
38: B 16000 2.089 2.089
39: B 17000 NA 2.089
40: B 18000 NA 2.089
41: B 19000 NA 2.089
42: B 20000 NA 2.089
Type Range_val Value Value2