Home > Mobile >  How to do conditional joins based on column values
How to do conditional joins based on column values

Time:12-22

enter image description here

I am trying to join to a lookup table based on the field value of my base table "Score" being between two field values in the lookup table ("ATTRIBUTE_2" and "ATTRIBUTE_3").

Basically the R version of the following SQL:

SELECT base.PolicyNo
    ,base.Score
    ,lkp.value ScoreFactor
FROM base base
INNER JOIN lkp lkp
    ON base.PERIL = lkp.PERIL
    AND base.COVERAGE = lkp.COVERAGE
    AND base.LocStCd = lkp.ATTRIBUTE_1
    AND base.Score > lkp.ATTRIBUTE_2
    AND base.Score <= lkp.ATTRIBUTE_3

I'm very new to R and still trying to grasp a lot of the syntax, so please be detailed in your responses :)

I've tried

TIERLKP[DF1, on = c("PERIL","COVERAGE",ATTRIBUTE_1 = "LocStCd",ATTRIBUTE_2 < "Score",ATTRIBUTE_3 >= "TOTALTIERSCOREVA"), nomatch=NA] and R tells me that ATTRIBUTE_2 and ATTRIBUTE_3 can't be found

I also tried

u = TIERLKP[DF1, on = c("PERIL","COVERAGE",ATTRIBUTE_1 = "LocStCd"), nomatch=NA]
    u = u[u$ATTRIBUTE_2 < u$Score,]
    u = u[u$ATTRIBUTE_3 >= u$Score,]

Which results in massive duplication of rows and the need for a cartesian join...doesn't seem efficient.

Per Phil's comment I tried left_join(DF1,TIERLKP, by = c("PERIL","COVERAGE","LocStCd" = "ATTRIBUTE_1","Score" > "ATTRIBUTE_2", "Score" <= "ATTRIBUTE_3"),copy = TRUE) with/without quotes as well as changing the order but am still getting "Error: Join columns must be present in data."

CodePudding user response:

We may use the data.table syntax as below. Also, convert to data.table if the data is data.frame with setDT

library(data.table)
setDT(TIERLKP)
setDT(DF1)
TIERLKP[DF1, on = .(PERIL, COVERAGE,ATTRIBUTE_1 = LocStCd,
   ATTRIBUTE_2 < Score,
   ATTRIBUTE_3 >= TOTALTIERSCOREVA), nomatch=NA] 

In the devel version of dplyr, we can use join_by which does the non-equi join

library(dplyr)
inner_join(TIERLKP, DF1, by = join_by(PERIL, COVERAGE,
      ATTRIBUTE_1 = LocStCd,   
       ATTRIBUTE_2 < Score,
   ATTRIBUTE_3 >= TOTALTIERSCOREVA))
  • Related