Home > Mobile >  Fill data.table with NA-values, using the nearest non-na-observation
Fill data.table with NA-values, using the nearest non-na-observation

Time:09-13

I have a data.table as so:

library(data.table)
dt.tst <- CJ(Type = c("A", "B"),
             Range_val = seq(0,20000, by = 1000))


dt.tst[Range_val == 2000 & Type == "A", Value := 0.987]
dt.tst[Range_val == 2000 & Type == "B", Value := 1.987]

dt.tst[Range_val == 9000 & Type == "A", Value := 1.056]
dt.tst[Range_val == 9000 & Type == "B", Value := 2.138]

dt.tst[Range_val == 16000 & Type == "A", Value := 1.563]
dt.tst[Range_val == 16000 & Type == "B", Value := 2.089]

I would like to fill the NA:s in the Value-column:

  • With the closest non-na-value by the Range_val-column and Type.
  • In case of a tie the highest Range_val and it's corresponding Value should be used (but this is not a deal-breaker, speed is more important).

I could do this pretty simple but slow with a for-loop. So I would like a more clean way of doing this (and faster). What would be a good data.table-way of doing this?

There will always be non-na-values but the intervals might differ.

CodePudding user response:

dt.tst[is.na(Value), Value := dt.tst[!is.na(Value)][dt.tst[is.na(Value)], roll = "nearest", on = .(Type, Range_val)]$Value]

output

dt.tst

#     Type Range_val Value
#  1:    A         0 0.987
#  2:    A      1000 0.987
#  3:    A      2000 0.987
#  4:    A      3000 0.987
#  5:    A      4000 0.987
#  6:    A      5000 0.987
#  7:    A      6000 1.056
#  8:    A      7000 1.056
#  9:    A      8000 1.056
# 10:    A      9000 1.056
# 11:    A     10000 1.056
# 12:    A     11000 1.056
# 13:    A     12000 1.056
# 14:    A     13000 1.563
# 15:    A     14000 1.563
# 16:    A     15000 1.563
# 17:    A     16000 1.563
# 18:    A     17000 1.563
# 19:    A     18000 1.563
# 20:    A     19000 1.563
# 21:    A     20000 1.563
# 22:    B         0 1.987
# 23:    B      1000 1.987
# 24:    B      2000 1.987
# 25:    B      3000 1.987
# 26:    B      4000 1.987
# 27:    B      5000 1.987
# 28:    B      6000 2.138
# 29:    B      7000 2.138
# 30:    B      8000 2.138
# 31:    B      9000 2.138
# 32:    B     10000 2.138
# 33:    B     11000 2.138
# 34:    B     12000 2.138
# 35:    B     13000 2.089
# 36:    B     14000 2.089
# 37:    B     15000 2.089
# 38:    B     16000 2.089
# 39:    B     17000 2.089
# 40:    B     18000 2.089
# 41:    B     19000 2.089
# 42:    B     20000 2.089

CodePudding user response:

If I understand your goal correctly, i,e., assigning a NA in the Value column with its nearest non-NA value, probably you can try max.col outer like below

dt.tst[
  ,
  Value2 := replace(
    Value,
    is.na(Value),
    na.omit(Value)[max.col(-abs(outer(which(is.na(Value)), which(!is.na(Value)), `-`)), "last")]
  ), Type
][]

which gives

    Type Range_val Value Value2
 1:    A         0    NA  0.987
 2:    A      1000    NA  0.987
 3:    A      2000 0.987  0.987
 4:    A      3000    NA  0.987
 5:    A      4000    NA  0.987
 6:    A      5000    NA  0.987
 7:    A      6000    NA  1.056
 8:    A      7000    NA  1.056
 9:    A      8000    NA  1.056
10:    A      9000 1.056  1.056
11:    A     10000    NA  1.056
12:    A     11000    NA  1.056
13:    A     12000    NA  1.056
14:    A     13000    NA  1.563
15:    A     14000    NA  1.563
16:    A     15000    NA  1.563
17:    A     16000 1.563  1.563
18:    A     17000    NA  1.563
19:    A     18000    NA  1.563
20:    A     19000    NA  1.563
21:    A     20000    NA  1.563
22:    B         0    NA  1.987
23:    B      1000    NA  1.987
24:    B      2000 1.987  1.987
25:    B      3000    NA  1.987
26:    B      4000    NA  1.987
27:    B      5000    NA  1.987
28:    B      6000    NA  2.138
29:    B      7000    NA  2.138
30:    B      8000    NA  2.138
31:    B      9000 2.138  2.138
32:    B     10000    NA  2.138
33:    B     11000    NA  2.138
34:    B     12000    NA  2.138
35:    B     13000    NA  2.089
36:    B     14000    NA  2.089
37:    B     15000    NA  2.089
38:    B     16000 2.089  2.089
39:    B     17000    NA  2.089
40:    B     18000    NA  2.089
41:    B     19000    NA  2.089
42:    B     20000    NA  2.089
    Type Range_val Value Value2
  • Related