We have two dataframes df1
and df2
.
df1 <- data.frame(x = c(10, 12, 52, 102))
df2 <- data.frame(y = c(11, 15, 30, 130))
We want to filter rows in df1$x
if they are within plus or minus 10 values of any value in 'df2$y'
so the result of the filter on df1
would be :
data.frame(x = c(10, 12))
because 10 and 12 in df1$x
are within /- 10 of 11 and 15 in df2$y
.
If we wanted to only filter rows that matched "exactly", we could use dplyr::left_join
. But how can we do this if we want to use a range of values (i.e., /- 10) as described above?
CodePudding user response:
We may use power_inner_join
library(powerjoin)
library(dplyr)
power_inner_join(df1, df2, by = c(~ .x$x <= (.y$y 10 ) &
.x$x >= (.y$y - 10))) %>%
distinct(x)
x
1 10
2 12
Or do a non-equi join with data.table
library(data.table)
setDT(df1)[setDT(df2)[, .(ystart = y-10, yend = y 10)],
flag := TRUE, on = .(x >= ystart, x <= yend)]
df1[as.logical(flag), .(x)]
x
<num>
1: 10
2: 12
CodePudding user response:
library(sqldf)
#> Loading required package: gsubfn
#> Loading required package: proto
#> Loading required package: RSQLite
#> Warning: package 'RSQLite' was built under R version 4.1.2
df1 <- data.frame(x = c(10, 12, 52, 102))
df2 <- data.frame(y = c(11, 15, 30, 130))
sqldf('
select
distinct
a.*
from
df1 a
join df2 b
on a.x between b.y - 10 and b.y 10
')
#> x
#> 1 10
#> 2 12
Created on 2022-07-26 by the reprex package (v2.0.1)
CodePudding user response:
## need drop = FALSE because df1 has just one column
df1[abs(df1$x - df2$y) <= 10, , drop = FALSE]
# x
#1 10
#2 12
## no need to consider "drop"
subset(df1, abs(df1$x - df2$y) <= 10)