Merge two dataframes by a closest value in R-CodePudding

I have two dataframes that I want to merge by the closest value in one column. The first dataframe (DF1) consists of individuals and their estimated individual risk ("risk"):

DF1<- data.frame(ID = c(1, 2, 3), risk = c(22, 40, 20))

ID      risk
1       22
2       40
3       20

The second dataframe (DF2) consists of population by age groups ("population_age") and the normal risks within each age group ("population_normal_risk"):

DF2<- data.frame(population_age = c("30-34","35-39","40-44"), population_normal_risk = c(15, 30, 45))

population_age      population_normal_risk
30-34                       15
35-39                       30
40-44                       45

What I want is to add a new column in the DF1 dataframe showing the population age group ("population_age") with the closest risk value ("population_normal_risk") to the estimated risk on each individual ("risk").

What I expected would be:

ID    risk     population_age_group
1     22             30-34
2     40             40-44
3     20             30-34

Thanks in advance!

CodePudding user response：

We can use findInterval.

First we need to calculate our break points at the halfway points between the population risk values:

breaks <- c(0, df2$population_normal_risk   c(diff(df2$population_normal_risk) / 2, Inf))

Then use findInterval to detect which bin our risks fall into:

matches <- findInterval(df1$risk, breaks)

Finally, write the matches in:

df1$population_age <- df2$population_age[matches]

Giving us:

df1
  ID risk population_age
1  1   22          30-34
2  2   40          40-44
3  3   20          30-34`

CodePudding user response：

We can try the code below using outer max.col

transform(
    DF1,
    population_age = DF2[max.col(-abs(outer(risk, DF2$population_normal_risk, `-`))), "population_age"]
)

which gives

  ID risk population_age
1  1   22          30-34
2  2   40          40-44
3  3   20          30-34