Home > Back-end >  sql query to fill sparse data in timeline
sql query to fill sparse data in timeline

Time:10-15

I have a table holding various information change related to employees. Some information change over time, but not alltogether, and changes occur periodically but not regularly. Changes are recorded by date, and if an item is not changed for the given employee at the given time, then the item's value is Null for that record. Say it looks like this:

employeeId Date Salary CommuteDistance
1 2000-01-01 1000 Null
2 2000-01-15 2000 20
3 2000-01-30 3000 Null
2 2010-02-15 2100 Null
3 2010-03-30 Null 30
1 2020-02-01 1100 10
1 2030-03-01 Null 100

Now, how can I write a query to fill the null values with the most recent non-null values for all employees at all dates, while keeping the value Null if there is no such previous non-null value? It should look like:

employeeId Date Salary CommuteDistance
1 2000-01-01 1000 Null
2 2000-01-15 2000 20
3 2000-01-30 3000 Null
2 2010-02-15 2100 20
3 2010-03-30 3000 30
1 2020-02-01 1100 10
1 2030-03-01 1100 100

(Note how the bolded values are taken over from previous records of same employee).

I'd like to use the query inside a view, then in turn query that view to get the picture at an arbitrary date (e.g., what were the salary and commute distance for the employees on 2021-08-17? - I should be able to do that, but I'm unable to build the view). Or, is there a better way to acomplish this?

There's no point in showing my attempts, since I'm quite inexperienced with advanced sql (I assume the solution empolys advanced knowledge, since I found my basic knowledge insufficient for this) and I got nowhere near the desired result.

CodePudding user response:

You may get the last not null value for employee salary or CommuteDistance using the following:

SELECT T.employeeId, T.Date,
       COALESCE(Salary, MAX(Salary) OVER (PARTITION BY employeeId, g1)) AS Salary,
       COALESCE(CommuteDistance, MAX(CommuteDistance) OVER (PARTITION BY employeeId, g2)) AS CommuteDistance
FROM
  (
    SELECT *,
      MAX(CASE WHEN Salary IS NOT null THEN Date END) OVER (PARTITION BY employeeId ORDER BY Date) AS g1,
      MAX(CASE WHEN CommuteDistance IS NOT null THEN Date END) OVER (PARTITION BY employeeId ORDER BY Date) AS g2
    FROM TableName 
  ) T
ORDER BY Date

See a demo.

CodePudding user response:

We group by employeeId and by Salary/CommuteDistance and all the nulls after them by Date. Then we fill in the blanks.

select   employeeId
        ,Date
        ,max(Salary) over(partition by employeeId, s_grp)          as Salary
        ,max(CommuteDistance) over(partition by employeeId, d_grp) as CommuteDistance
from     (
         select  *
                 ,count(case when Salary is not null then 1 end) over(partition by employeeId order by Date)          as s_grp
                 ,count(case when CommuteDistance is not null then 1 end) over(partition by employeeId order by Date) as d_grp
         from    t
         ) t
order by Date
employeeId Date Salary CommuteDistance
1 2000-01-01 1000 null
2 2000-01-15 2000 20
3 2000-01-30 3000 null
2 2010-02-15 2100 20
3 2010-03-30 3000 30
1 2020-02-01 1100 10
1 2030-03-01 1100 100

Fiddle

  • Related