Home > OS >  Inexact joining data based on greater equal condition
Inexact joining data based on greater equal condition

Time:11-29

I have some values in df:

# A tibble: 7 × 1
   var1
  <dbl>
1     0
2    10
3    20
4   210
5   230
6   266
7   267

that I would like to compare to a second dataframe called value_lookup

# A tibble: 4 × 2
   var1 value
  <dbl> <dbl>
1     0     0
2   200    10
3   230    20
4   260    30

In particual I would like to make a join based on >= meaning that a value that is greater or equal to the number in var1 gets a values of x. E.g. take the number 210 of the orginal dataframe. Since it is >= 200 and <230 it would get a value of 10.

Here is the expected output:

  var1 value
1    0     0
2   10     0
3   20     0
4  210    10
5  230    20
6  266    30
7  267    30

I thought it should be doable using {fuzzyjoin} but I cannot get it done.

value_lookup <- tibble(var1 = c(0, 200,230,260),
                       value = c(0,10,20,30))

df <- tibble(var1 = c(0,10,20,210,230,266,267))

library(fuzzyjoin)
fuzzyjoin::fuzzy_left_join(
  x = df, 
  y = value_lookup ,
  by = "var1",
  match_fun = list(`>=`)
) 

CodePudding user response:

An option is also findInterval:

df$value <- value_lookup$value[findInterval(df$var1, value_lookup$var1)]

Output:

  var1 value
1    0     0
2   10     0
3   20     0
4  210    10
5  230    20
6  266    30
7  267    30

As you're mentioning joins, you could also do a rolling join via data.table with the argument roll = T which would look for same or closest value preceding var1 in your df:

library(data.table)

setDT(value_lookup)[setDT(df), on = 'var1', roll = T]

CodePudding user response:

You can use cut:

df$value <- value_lookup$value[cut(df$var1, 
                                   c(value_lookup$var1, Inf), 
                                   right=F)]
# # A tibble: 7 x 2
#    var1 value
#   <dbl> <dbl>
# 1     0     0
# 2    10     0
# 3    20     0
# 4   210    10
# 5   230    20
# 6   266    30
# 7   267    30
  • Related