I am trying to create a view in postgreSQL with the requirements as below:
The table needs to show the same period last year data for every records.
Sample data:
date_sk | location_sk | division_sk | employee_type_sk | value
20180202 | 6 | 8 | 4 | 1
20180202 | 7 | 2 | 4 | 2
20190202 | 6 | 8 | 4 | 1
20190202 | 7 | 2 | 4 | 1
20200202 | 6 | 8 | 4 | 1
20200202 | 7 | 2 | 4 | 3
In the table, date_sk, location_sk, division_sk and employee_type_sk are super keys which form an unique record in the table. You can check the required output as below:
date_sk | location_sk | division_sk | employee_type_sk | value | value_last_year
20180202 | 6 | 8 | 4 | 1 | NULL
20180203 | 7 | 2 | 4 | 2 | NULL
20190202 | 6 | 8 | 4 | 1 | 1
20190203 | 7 | 3 | 4 | 1 | NULL
20200202 | 6 | 8 | 4 | 1 | 1
20200203 | 7 | 3 | 4 | 3 | 1
The records start on 20180202, therefore, the data for the same period last year is unavailable. At the 4th record, there is a difference in division_sk comparing with the same period last year - hence, the head_count_last_year is NULL. My current solution is to create a view from the sample data with an addition column as same_date_last_year then LEFT JOIN the same table. The SQL queries are below:
CREATE VIEW test_view AS
SELECT *,
CONCAT(LEFT(date_sk, 4) - 1, RIGHT(date_sk, 4)) AS same_date_last_year
FROM test_table
SELECT
test_view.date_sk,
test_view.location_sk,
test_view.division_sk,
test_view.employee_type_sk,
test_view.value,
test_table.value AS value_last_year
FROM test_view
LEFT JOIN test_table ON (test_view.same_date_last_year = test_table.date_sk)
We have a lot of data in the table. My solution above is unacceptable in terms of performance. Is there a different query which yields the same result and might improve the performance ?
CodePudding user response:
You could simply use a correlated subquery here which is likely best for performance:
select *,
(
select value from t t2
where t2.date_sk=t.date_sk - interval '1' year and
t2.location_sk=t.location_sk and
t2.division_sk=t.division_sk and
t2.employee_type_sk=t.employee_type_sk
) as value_last_year
from t
CodePudding user response:
WITH CTE(DATE_SK,LOCATION_SK,DIVISION_SK,EMPLOYEE_TYPE_SK,VALUE)AS
(
SELECT CAST('20180202' AS DATE),6,8,4,1 UNION ALL
SELECT CAST('20180203'AS DATE),7,2,4,2 UNION ALL
SELECT CAST('20190202'AS DATE),6,8,4,1 UNION ALL
SELECT CAST('20190203'AS DATE),7,2,4,1 UNION ALL
SELECT CAST('20200202'AS DATE),6,8,4,1 UNION ALL
SELECT CAST('20200203'AS DATE),7,2,4,3
)
SELECT C.DATE_SK,C.LOCATION_SK,C.DIVISION_SK,C.EMPLOYEE_TYPE_SK,C.VALUE,
LAG(C.VALUE)OVER(PARTITION BY C.LOCATION_SK,C.DIVISION_SK,C.EMPLOYEE_TYPE_SK ORDER BY C.DATE_SK ASC)LAGG
FROM CTE AS C
ORDER BY C.DATE_SK ASC;
Could you please try if the above is suitable for you. I assume,DATE_SK is a date column or can be CAST to a date