Home > database >  Fastest way to join 2 tables to get the most recent record
Fastest way to join 2 tables to get the most recent record

Time:12-31

I have 2 tables:

First one is bom

Article
AB
CD
EF
GH
CREATE TABLE bom
(
    Article VARCHAR(250)
);

INSERT INTO bom (Article)
 VALUES
 ('AB'),
 ('CD'),
 ('EF'),
 ('GH');

Second one is purchases

Article OrderDate Price
AB '2020-01-10' 12
AB '2020-01-05' 10
AB '2020-01-03' 8
EF '2020-01-01' 7
 CREATE TABLE purchases
(
    Article VARCHAR(250),
    OrderDate DATE,
    Price DOUBLE
);

INSERT INTO purchases (Article, OrderDate, Price)
 VALUES
 ('AB', '2020-01-10', 12.0),
 ('AB', '2020-01-05', 10.0),
 ('AB', '2020-01-03', 8.0),
 ('EF', '2020-01-01', 7.0);

I want to extract the most recent price for each row of Article at a given date. For instance, at @evalDay = '2020-01-04', I want to get

Article OrderDate Price
AB '2020-01-03' 8
EF '2020-01-01' 7

I've managed it to work using a window function (row_number() over), but the performance is not as good as I need. This is a simplified example, but my bom table has a few hundred of rows, whereas the purchases has about 1 million rows. On my computer, it takes approx. 50ms to execute. Of course I use indexes and compound indexes.

My solution:

set @evalDay = '2020-01-04';
with cte (Article, OrderDate, Price, rn) as (
    select purchases.*,
        row_number() over (
            partition by bom.article
            order by purchases.OrderDate desc
        ) as rn
    from bom
        join purchases on bom.Article = purchases.Article
    where purchases.OrderDate <= @evalDay
)
select *
from cte
where rn = 1;

In this case, what's the fastest approach to get the answer?

CodePudding user response:

I would try the following approaches:

  • Move the join to outside
with cte (Article, OrderDate, Price, rn) as (
select *,
       row_number() over (partition by article order by OrderDate desc) as rn
    from purchases
   where OrderDate <= @evalDay)
select cte.*, bom.*
  from cte
  join bom
    on cte.Article = bom.Article
 where cte.rn = 1;
  • Remove the join if no additional columns needed from bom
with cte (Article, OrderDate, Price, rn) as (
select *,
       row_number() over (partition by article order by OrderDate desc) as rn
    from purchases
   where OrderDate <= @evalDay)
select *
  from cte
 where rn = 1;

If the above still doesn't perform, consider creating a table to store the result (Article,OrderOdate,Price,evalDate) partitioned by evalDate.

CodePudding user response:

If you have more than one row in purchases which has the same latest date for the same Article, this might not give you what you want but it's a fairly simple query....

set @evalDay = '2020-01-04';
Select a.*
From purchases a, 
  (select Article, Max(OrderDate) AS ODate
  from purchases
  where OrderDate <= @evalDay
  group by Article) b
Where a.Article = b.Article
And a.OrderDate = b.ODate;
  • Related