I have two dataframes that I want to merge by the closest value in one column. The first dataframe (DF1) consists of individuals and their estimated individual risk ("risk"):
DF1<- data.frame(ID = c(1, 2, 3), risk = c(22, 40, 20))
ID risk
1 22
2 40
3 20
The second dataframe (DF2) consists of population by age groups ("population_age") and the normal risks within each age group ("population_normal_risk"):
DF2<- data.frame(population_age = c("30-34","35-39","40-44"), population_normal_risk = c(15, 30, 45))
population_age population_normal_risk
30-34 15
35-39 30
40-44 45
What I want is to add a new column in the DF1 dataframe showing the population age group ("population_age") with the closest risk value ("population_normal_risk") to the estimated risk on each individual ("risk").
What I expected would be:
ID risk population_age_group
1 22 30-34
2 40 40-44
3 20 30-34
Thanks in advance!
CodePudding user response:
We can use findInterval
.
First we need to calculate our break points at the halfway points between the population risk values:
breaks <- c(0, df2$population_normal_risk c(diff(df2$population_normal_risk) / 2, Inf))
Then use findInterval
to detect which bin our risks fall into:
matches <- findInterval(df1$risk, breaks)
Finally, write the matches in:
df1$population_age <- df2$population_age[matches]
Giving us:
df1
ID risk population_age
1 1 22 30-34
2 2 40 40-44
3 3 20 30-34`
CodePudding user response:
We can try the code below using outer
max.col
transform(
DF1,
population_age = DF2[max.col(-abs(outer(risk, DF2$population_normal_risk, `-`))), "population_age"]
)
which gives
ID risk population_age
1 1 22 30-34
2 2 40 40-44
3 3 20 30-34