Home > Software engineering >  Finding closest value in another table
Finding closest value in another table

Time:12-03

I'm quite new to SQL so bare with me. What I'm trying to do is return the value closest to another value in a different table for every record.

I'll show a simplified example of my two tables for clarification

First table is the one that I want the value ENTRY_YEAR matched to:

ID ENTRY_VALUE
1001 1900
1002 2000

And the second table:

ID ENTRY_VALUE STATUS
1001 1880 SUCCES
1001 1930 FAIL
1001 1940 SUCCES
1002 1960 SUCCES
1002 1980 FAIL

So the end result I'm looking for is:

ID ENTRY_VALUE STATUS
1001 1880 SUCCES
1002 1980 FAIL

I have currently only managed to link the id's together but can't find a way to compare the ENTRY_VALUE in both tables and return the one closest to the Table1 entry.

So only this:

SELECT * from Table2

INNER JOIN  Table1 ON (Table2.ID = Table1.ID) 

Once again my bad for the basic question, I have googled right about everything but can't get it to work so any help is very welcome!

CodePudding user response:

First attempt

This is a (slower performing) query. First attempt! This is an approach using a "correlated subquery" so it runs the inner query for each row of the outer query. The strategy is, for each row, to determine what the min value is we are looking for, and then select only the rows that fit that criteria. But such queries can be slow at runtime, although the logic is very clean.

select 
    a.id,
    b.entry_value,
    b.[status]
from 
    Foo a 
    inner join Bar b
    on a.id = b.id
where 
    abs(a.entry_value - b.entry_value) =
        (select min(abs(t1.entry_value-t2.entry_value))
            from Foo t1 
            inner join Bar t2
            on t1.id = t2.id
            where
                t1.id = a.id
            group by t1.id)
Second attempt

If you have many rows (in the tens of thousands or in any case if the previous query is just too slow), then this next one should be better performing. Second Attempt! If you run the two inner queries by themselves, you will probably see the strategy here of how we are joining them to get the desired result.

select A.Id, A.entry_value, A.[status]
from    
    (
    select t1.id, t2.entry_value, abs(t1.entry_value-t2.entry_value) as diff, t2.[status]
    from Foo t1 
    inner join Bar t2
    on t1.id = t2.id
    ) A
    inner join
    (
    select t3.id, min(abs(t3.entry_value-t4.entry_value)) as diff
    from Foo t3
    inner join Bar t4
    on t3.id = t4.id
    group by t3.id
    ) B
    on A.id = B.id
    and A.diff = B.diff
Notes

I would probably not try to write either of these queries in MSAccess "Design view" although if I had too I am sure I could. But generally, this is a case where I would write the query "by hand" and paste it into your query directly using MSAccess "SQL view".

Beware that ties will result in two rows! Example: First table has (1003,2000) Second table has (1003, 1990, 'success') and (1003, 2010, 'fail') You will have a result with two rows, one with success and the other with fail (!)

So you really should test with your data and look for such cases that might produce such ties (and decide what to do, if necessary).

Btw...

just for fun, here's how you might go for it in SQL Server. But I think this will NOT work in MSAccess, unfortunately.

select
    T.id,
    T.entry_value,
    T.[status]
from
(
    select 
        t1.id,
        t2.entry_value,
        abs(t1.entry_value-t2.entry_value) as diff,
        t2.[status],
        rank() over (partition by t1.id order by abs(t1.entry_value-t2.entry_value)) as seq
        from #Foo t1 
        inner join #Bar t2
        on t1.id = t2.id
) T
where T.seq = 1;

CodePudding user response:

Use a simple subquery to find the minimum offset:

Select 
    tbl1.ID, 
    tbl2.ENTRY_VALUE,
    tbl2.STATUS
From 
    tbl1 
Inner Join 
    tbl2 On tbl1.ID = tbl2.ID
Where 
    Abs([tbl1].[ENTRY_VALUE] - [tbl2].[ENTRY_VALUE]) =
        (Select Min(Abs([tbl1].[ENTRY_VALUE] - [T2].[ENTRY_VALUE])) As Offset
        From tbl2 As T2 
        Where T2.ID = tbl1.ID);

Output:

ID ENTRY_VALUE STATUS
1001 1880 SUCCES
1002 1980 FAIL

Note, that if the minimum offset for an ID exists twice, both records having this offset will be returned. Thus, you may have to aggregate the output.

  • Related