I have a table holding various information change related to employees. Some information change over time, but not alltogether, and changes occur periodically but not regularly. Changes are recorded by date, and if an item is not changed for the given employee at the given time, then the item's value is Null for that record. Say it looks like this:
employeeId | Date | Salary | CommuteDistance |
---|---|---|---|
1 | 2000-01-01 | 1000 | Null |
2 | 2000-01-15 | 2000 | 20 |
3 | 2000-01-30 | 3000 | Null |
2 | 2010-02-15 | 2100 | Null |
3 | 2010-03-30 | Null | 30 |
1 | 2020-02-01 | 1100 | 10 |
1 | 2030-03-01 | Null | 100 |
Now, how can I write a query to fill the null values with the most recent non-null values for all employees at all dates, while keeping the value Null if there is no such previous non-null value? It should look like:
employeeId | Date | Salary | CommuteDistance |
---|---|---|---|
1 | 2000-01-01 | 1000 | Null |
2 | 2000-01-15 | 2000 | 20 |
3 | 2000-01-30 | 3000 | Null |
2 | 2010-02-15 | 2100 | 20 |
3 | 2010-03-30 | 3000 | 30 |
1 | 2020-02-01 | 1100 | 10 |
1 | 2030-03-01 | 1100 | 100 |
(Note how the bolded values are taken over from previous records of same employee).
I'd like to use the query inside a view, then in turn query that view to get the picture at an arbitrary date (e.g., what were the salary and commute distance for the employees on 2021-08-17? - I should be able to do that, but I'm unable to build the view). Or, is there a better way to acomplish this?
There's no point in showing my attempts, since I'm quite inexperienced with advanced sql (I assume the solution empolys advanced knowledge, since I found my basic knowledge insufficient for this) and I got nowhere near the desired result.
CodePudding user response:
You may get the last not null value for employee salary or CommuteDistance using the following:
SELECT T.employeeId, T.Date,
COALESCE(Salary, MAX(Salary) OVER (PARTITION BY employeeId, g1)) AS Salary,
COALESCE(CommuteDistance, MAX(CommuteDistance) OVER (PARTITION BY employeeId, g2)) AS CommuteDistance
FROM
(
SELECT *,
MAX(CASE WHEN Salary IS NOT null THEN Date END) OVER (PARTITION BY employeeId ORDER BY Date) AS g1,
MAX(CASE WHEN CommuteDistance IS NOT null THEN Date END) OVER (PARTITION BY employeeId ORDER BY Date) AS g2
FROM TableName
) T
ORDER BY Date
See a demo.
CodePudding user response:
We group by employeeId
and by Salary/CommuteDistance
and all the nulls
after them by Date
. Then we fill in the blanks.
select employeeId
,Date
,max(Salary) over(partition by employeeId, s_grp) as Salary
,max(CommuteDistance) over(partition by employeeId, d_grp) as CommuteDistance
from (
select *
,count(case when Salary is not null then 1 end) over(partition by employeeId order by Date) as s_grp
,count(case when CommuteDistance is not null then 1 end) over(partition by employeeId order by Date) as d_grp
from t
) t
order by Date
employeeId | Date | Salary | CommuteDistance |
---|---|---|---|
1 | 2000-01-01 | 1000 | null |
2 | 2000-01-15 | 2000 | 20 |
3 | 2000-01-30 | 3000 | null |
2 | 2010-02-15 | 2100 | 20 |
3 | 2010-03-30 | 3000 | 30 |
1 | 2020-02-01 | 1100 | 10 |
1 | 2030-03-01 | 1100 | 100 |