I have data in Redshift that I'm aggregating to the Year-Quarter level i.e. number of items by Year-Quarter
I need to show a continuous trend and hence I need to fill-in the gaps in Year-Quarter. The picture below should give a clearer idea of my current data and desired output.
How can I achieve this in Redshift SQL?
CodePudding user response:
This is known as forward filling values:
CREATE TABLE #Temp
(
[YQ] nvarchar(5),
[items] int
)
INSERT INTO #Temp Values ('20201',10),('20204', 15),('20213', 25),('20222', 30)
---------------------------------------------------------------------------------
DECLARE @start int, @end int, @starty int, @endy int
SELECT @start=1, @end=4
SELECT @starty=MIN(Substring(YQ,0,5)), @endy=MIN(Substring(YQ,0,5)) from #Temp
;With cte1(y) as
(
Select @starty as y
union all
Select y 1
from cte1
where y <= @endy 1
)
, cte2(n) as
(
Select @start as n
union all
Select n 1
from cte2
where n < @end
)
SELECT t1.YQ AS 'Year-Quarter',
CASE WHEN t2.items is null then (SELECT TOP 1 MAX(items) from #Temp WHERE items is not null and YQ < t1.YQ) ELSE t2.items END AS '# Items'
FROM
(
SELECT CAST(cte1.y AS nvarchar(4)) CAST(cte2.n AS nvarchar(1)) AS YQ
FROM cte1, cte2
) t1
LEFT JOIN #Temp t2 ON t2.YQ = t1.YQ
WHERE t1.YQ <= (SELECT MAX(YQ) FROM #Temp)
ORDER BY t1.YQ, t2.items
CodePudding user response:
A query like this should do the trick:
create table test (yq int, items int);
INSERT INTO test Values (20201,10),(20204, 15),(20213, 25),(20222, 30);
with recursive quarters(q) as (
select min(yq) as q
from test
union all
select decode(right(q::text, 1), 4, q 7, q 1) as q
from quarters
where q < (select max(yq) from test)
)
select q as yq, decode(items is null, true,
lag(items ignore nulls) over (order by q), items) as items
from test t
right join quarters q
on t.yq = q.q
order by q;
It uses a recursive CTE to generate the quarters range needed, right joins this with the source data, and then uses a LAG() window function to populate the items if the value is NULL.