Home > Software engineering >  Lookup deciles from one data.table in another data.table (maybe: how to take a by= argument into ano
Lookup deciles from one data.table in another data.table (maybe: how to take a by= argument into ano

Time:04-06

I have a data table with prices that looks like:

library("data.table")
Data <- data.table(id = c(1,2),
                   value = c(250, 450))

I have another data table with price decile distributions that looks like:

lookupdata <- data.table(decile = seq(0.1,1, by = 0.1),
                         value_lookup = seq(100,1000, by = 100))

I want to add a column to the first data.table that tells me in which price decile each id happens to be. I managed to code a loop through id that does this:

for(i in Data$id){
  Data[id == i, decile := min(lookupdata[Data[id == i,value] <= value_lookup, decile])]
  print(i)
}

But since my actual data has 23 million observations, this loop takes ages. I'm looking for an efficient data.table solution to this issue. All that needs to happen is something like: Lookup your value in the lookupdata, when you have found it, take the associated decile and assign it to your id.

Here is my attempt (feel free to ignore, I'm just looking for a solution):

Data[, wrong_decile := min(lookupdata[value_lookup >= Data[,value], decile]), by = id][]

The problem here is that the inner Data[, value] does not know which ID to return so I'm just taking the global minimum here. I just don't manage to get this into a by=id format without a loop through IDs because I don't know how to take this by=id argument inside another data.table within a filter. This might be my actual question but it feels like this should be much simpler then my attempt.

CodePudding user response:

I think you are looking for a rolling join

Data[, decile := lookupdata[Data, decile, on = .(value_lookup = value), roll = "nearest"]][]
#    id value decile
# 1:  1   250    0.2
# 2:  2   450    0.4
  • Related