I have a result set with month as first column. Some of the month are missing in the result. I need to add previous month record as the missing month till last month.
Current data:
Desired Output:
I have a sql but instead of filling for just missing month it is taking every rows into account and populate it.
select
to_char(generate_series(date_trunc('MONTH',to_date(period,'YYYYMMDD') interval '1' month),
date_trunc('MONTH',now() interval '1' day),
interval '1' month) - interval '1 day','YYYYMMDD') as period,
name,age,salary,rating
from( values ('20201205','Alex',35,100,'A '),
('20210110','Alex',35,110,'A'),
('20210512','Alex',35,999,'A '),
('20210625','Jhon',20,175,'B-'),
('20210922','Jhon',20,200,'B ')) v (period,name,age,salary,rating) order by 2,3,4,5,1;
Output of this query:
Can someone help in getting desired output.
Regards!!
CodePudding user response:
You can achieve this with a recursive cte like this:
with RECURSIVE ctetest as (SELECT * FROM (values ('2020-12-31'::date,'Alex',35,100,'A '),
('2021-01-31'::date,'Alex',35,110,'A'),
('2021-05-31'::date,'Alex',35,999,'A '),
('2021-06-30'::date,'Jhon',20,175,'B-'),
('2021-09-30'::date,'Jhon',20,200,'B ')) v (mth, emp, age, salary, rating)),
cte AS (
SELECT MIN(mth) AS mth, emp, age, salary, rating
FROM ctetest
GROUP BY emp, age, salary, rating
UNION
SELECT COALESCE(n.mth, (l.mth interval '1 day' interval '1 month' - interval '1 day')::date), COALESCE(n.emp, l.emp),
COALESCE(n.age, l.age), COALESCE(n.salary, l.salary), COALESCE(n.rating, l.rating)
FROM cte l
LEFT OUTER JOIN ctetest n ON n.mth = (l.mth interval '1 day' interval '1 month' - interval '1 day')::date
AND n.emp = l.emp
WHERE (l.mth interval '1 day' interval '1 month' - interval '1 day')::date <= (SELECT MAX(mth) FROM ctetest)
)
SELECT * FROM cte order by 2, 1;
Note that although ctetest is not itself recursive
, being only used to get the test data, if any cte among multiple ctes are recursive
, you must have the recursive keyword after the with
.
CodePudding user response:
You can use cross join lateral
to fill the gaps and then union all
with the original data.
WITH the_table (period, name, age, salary, rating) as ( values
('2020-12-01'::date, 'Alex', 35, 100, 'A '),
('2021-01-01'::date, 'Alex', 35, 110, 'A'),
('2021-05-01'::date, 'Alex', 35, 999, 'A '),
('2021-06-01'::date, 'Jhon', 20, 100, 'B-'),
('2021-09-01'::date, 'Jhon', 20, 200, 'B ')
),
t as (
select *, coalesce(
lead(period) over (partition by name order by period) - interval 'P1M',
max(period) over ()
) last_period
from the_table
)
SELECT lat::date period, name, age, salary, rating
from t
cross join lateral generate_series
(period interval 'P1M', last_period, interval 'P1M') lat
UNION ALL
SELECT * from the_table
ORDER BY name, period;
Please note that using integer
data type for a date column is sub-optimal. Better review your data design and use date
data type instead. You can then present it as integer if necessary.
period | name | age | salary | rating |
---|---|---|---|---|
2020-12-01 | Alex | 35 | 100 | A |
2021-01-01 | Alex | 35 | 110 | A |
2021-02-01 | Alex | 35 | 110 | A |
2021-03-01 | Alex | 35 | 110 | A |
2021-04-01 | Alex | 35 | 110 | A |
2021-05-01 | Alex | 35 | 999 | A |
2021-06-01 | Alex | 35 | 999 | A |
2021-07-01 | Alex | 35 | 999 | A |
2021-08-01 | Alex | 35 | 999 | A |
2021-09-01 | Alex | 35 | 999 | A |
2021-06-01 | Jhon | 20 | 100 | B- |
2021-07-01 | Jhon | 20 | 100 | B- |
2021-08-01 | Jhon | 20 | 100 | B- |
2021-09-01 | Jhon | 20 | 200 | B |