Mapping and assigning values between columns of different lengths using R-CodePudding

I got a df such as this

structure(list(id = c(4375, 4375, 4375, 4375), time = c(0, 88, 
96, 114)), class = "data.frame", row.names = c(NA, -4L))

and a second df (df2) such as this

structure(list(id2 = c(4375, 4375, 4375, 4375, 4375, 4375, 4375, 
4375, 4375, 4375), time2 = c(0, 2, 87, 88, 94, 97, 101, 104, 
109, 114), score2 = c(0.028, 0.057, 0.057, 0.057, 0.057, 0.057, 
0.057, 0.085, 0.085, 0.085)), class = "data.frame", row.names = c(NA, 
-10L))

I want for each id, to map the time column in both df and creating a score column in df1 and assigning the value of score from df2, when the time value in df1 is equal or less than the time in df2. I want the final df to look like this

structure(list(id3 = c(4375, 4375, 4375, 4375), time3 = c(0, 
88, 96, 116), score3 = c(0.028, 0.057, 0.057, 0.085)), class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response：

We could do it with fuzzyjoin to handle near matches:

library(fuzzyjoin)
library(dplyr)

fuzzy_left_join(df, df2, by = c("time"="time2"), match_fun = list(`<=`)) %>% 
  group_by(time2) %>% 
  slice(n()) %>% 
  ungroup() %>% 
  group_by(time) %>% 
  filter(row_number()==1) %>% 
  select(id3=id, time3=time, score3 = score2)

    id3 time3 score3
  <dbl> <dbl>  <dbl>
1  4375     0  0.028
2  4375    88  0.057
3  4375    96  0.057
4  4375   114  0.085

CodePudding user response：

For exact matches in the id and time columns, we can use left_join:

library(dplyr)

left_join(df, df2, by=c('id'='id2', 'time'='time2'))

    id time score2
1 4375    0  0.028
2 4375   88  0.057
3 4375   96     NA
4 4375  114  0.085

However, as we can see, this wont handle near-matches well. Please define, what do you mean by " when the time value in df1 is equal or less than the time in df2.". equal or less than what? How do we match the elements for the "equal or less then" comparison?