I have a table that contains an item ID, the date and the price. All items show their price for each day, but I want only to select the items that have not had their price change, and to show the days without change.
An example of the table is
id | Price | Day | Month | Year |
---|---|---|---|---|
asdf | 10 | 03 | 11 | 2022 |
asdr1 | 8 | 03 | 11 | 2022 |
asdf | 10 | 02 | 11 | 2022 |
asdr1 | 8 | 02 | 11 | 2022 |
asdf | 10 | 01 | 11 | 2022 |
asdr1 | 7 | 01 | 11 | 2022 |
asdf | 9 | 31 | 10 | 2022 |
asdr1 | 8 | 31 | 10 | 2022 |
asdf | 8 | 31 | 10 | 2022 |
asdr1 | 8 | 31 | 10 | 2022 |
The output I want is:
Date | id | Last_Price | First_Price_Appearance | DaysWOchange |
---|---|---|---|---|
2022-11-03 | asdf | 10 | 2022-11-01 | 2 |
2022-11-03 | asdr1 | 8 | 2022-11-02 | 1 |
The solutions needs to run quickly, so how are some efficency intensive ways to solve this, considering that the table has millions of rows, and there are items that have not changed their price in years.
The issue for efficiency comes because for each id, I would need to loop the entire table, looking for the first match in which the price has changed, and repeat this for thousands of items.
I am attempting to calculate the difference between the current last price, and all the history, but these becomes slow to process, and may take several minutes to calculate for all of history. The main concern for this problem is efficiency.
CodePudding user response:
DECLARE @table TABLE (id NVARCHAR(5), Price INT, Date DATE)
INSERT INTO @table (id, Price, Date) VALUES
('asdf', 10, '2022-10-20'),
('asdr1', 8, '2022-10-15'),
('asdf', 10, '2022-11-03'),
('asdr1', 8, '2022-11-02'),
('asdf', 10, '2022-11-02'),
('asdr1', 8, '2022-11-02'),
('asdf', 10, '2022-11-01'),
('asdr1', 7, '2022-11-01'),
('asdf', 9, '2022-10-31'),
('asdr1', 8, '2022-10-31'),
('asdf', 8, '2022-10-31'),
('asdr1', 8, '2022-10-31')
Tables of data are useful, but it's even more so if you can put the demo date into an object.
SELECT id, FirstDate, LastChange, DaysSinceChange, Price
FROM (
SELECT id, MIN(Date) OVER (PARTITION BY id ORDER BY Date) AS FirstDate, Date AS LastChange, Price,
CASE WHEN LEAD(Date,1) OVER (PARTITION BY id ORDER BY Date) IS NULL THEN DATEDIFF(DAY,Date,CURRENT_TIMESTAMP)
ELSE DATEDIFF(DAY,LAG(Date) OVER (PARTITION BY id ORDER BY Date),Date)
END AS DaysSinceChange, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM @table
) a
WHERE rn = 1
This is a quick way to get what you want. If you execute the subquery by itself you can see all the history.
id FirstDate LastChange Price DaysSinceChange
-------------------------------------------------------
asdf 2022-10-20 2022-11-03 10 0
asdr1 2022-10-15 2022-11-02 8 1
SELECT id, MIN(Date) OVER (PARTITION BY id ORDER BY Date) AS FirstDate, Date AS LastChange, Price,
CASE WHEN LEAD(Date,1) OVER (PARTITION BY id ORDER BY Date) IS NULL THEN DATEDIFF(DAY,Date,CURRENT_TIMESTAMP)
ELSE DATEDIFF(DAY,LAG(Date) OVER (PARTITION BY id ORDER BY Date),Date)
END AS DaysSinceChange, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM @table
id FirstDate LastChange Price DaysSinceChange rn
------------------------------------------------------
asdf 2022-10-20 2022-11-03 10 0 1
asdf 2022-10-20 2022-11-02 10 1 2
asdf 2022-10-20 2022-11-01 10 1 3
asdf 2022-10-20 2022-10-31 9 11 4
asdf 2022-10-20 2022-10-31 8 0 5
asdf 2022-10-20 2022-10-20 10 NULL 6
asdr1 2022-10-15 2022-11-02 8 1 1
asdr1 2022-10-15 2022-11-02 8 1 2
asdr1 2022-10-15 2022-11-01 7 1 3
asdr1 2022-10-15 2022-10-31 8 16 4
asdr1 2022-10-15 2022-10-31 8 0 5
asdr1 2022-10-15 2022-10-15 8 NULL 6
CodePudding user response:
You can use lag() and a cumulative max():
select id, date, price
from (select t.*,
max(case when price <> lag_price then date end) over (partition by id) as price_change_date
from (select t.*, lag(price) over (partition by id order by date) as lag_price
from t
) t
) t
where price_change_date is null;
This calculates the first date of a price change for each id. It then filters out all rows where a price change occurred. The use of window functions should be highly efficient, taking advantage of indexes on (id, date) and (id, price, date).