I got a df such as this
structure(list(id = c(4375, 4375, 4375, 4375), time = c(0, 88,
96, 114)), class = "data.frame", row.names = c(NA, -4L))
and a second df (df2) such as this
structure(list(id2 = c(4375, 4375, 4375, 4375, 4375, 4375, 4375,
4375, 4375, 4375), time2 = c(0, 2, 87, 88, 94, 97, 101, 104,
109, 114), score2 = c(0.028, 0.057, 0.057, 0.057, 0.057, 0.057,
0.057, 0.085, 0.085, 0.085)), class = "data.frame", row.names = c(NA,
-10L))
I want for each id, to map the time column in both df and creating a score column in df1 and assigning the value of score from df2, when the time value in df1 is equal or less than the time in df2. I want the final df to look like this
structure(list(id3 = c(4375, 4375, 4375, 4375), time3 = c(0,
88, 96, 116), score3 = c(0.028, 0.057, 0.057, 0.085)), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
We could do it with fuzzyjoin
to handle near matches:
library(fuzzyjoin)
library(dplyr)
fuzzy_left_join(df, df2, by = c("time"="time2"), match_fun = list(`<=`)) %>%
group_by(time2) %>%
slice(n()) %>%
ungroup() %>%
group_by(time) %>%
filter(row_number()==1) %>%
select(id3=id, time3=time, score3 = score2)
id3 time3 score3
<dbl> <dbl> <dbl>
1 4375 0 0.028
2 4375 88 0.057
3 4375 96 0.057
4 4375 114 0.085
CodePudding user response:
For exact matches in the id and time columns, we can use left_join
:
library(dplyr)
left_join(df, df2, by=c('id'='id2', 'time'='time2'))
id time score2
1 4375 0 0.028
2 4375 88 0.057
3 4375 96 NA
4 4375 114 0.085
However, as we can see, this wont handle near-matches well. Please define, what do you mean by " when the time value in df1 is equal or less than the time in df2.". equal or less than what? How do we match the elements for the "equal or less then" comparison?