Home > Back-end >  Update gaps in sequential table
Update gaps in sequential table

Time:02-04

I have a table that contains employee bank data

Employee      |Bank          |Date           |Delta
---------------------------------------------------
Smith         |Vacation      |2023-01-01     |15.0
Smith         |Vacation      |2023-01-02     |Null
Smith         |Vacation      |2023-01-03     |Null
Smith         |Vacation      |2023-01-04     |7.5

I would like to write a statement so that I can update 2023-01-02 and 2023-01-03 with the Delta value from January 1. Essentially, I want to use the value from the most recent row that isn't > than the date on the row.

Once complete, I want the table to look like this:

Employee      |Bank          |Date           |Delta
---------------------------------------------------
Smith         |Vacation      |2023-01-01     |15.0
Smith         |Vacation      |2023-01-02     |15.0
Smith         |Vacation      |2023-01-03     |15.0
Smith         |Vacation      |2023-01-04     |7.5

The source table has a unique index consisting of Employee, Bank and Date descending. There could be up to 2 billion rows in the table.

I currently update the table with the following, but I am wondering if there is a more efficient way to do so?

WITH cte_date
     AS (SELECT dd.date_key, 
                db.balance_key, 
                feb.employee_key
         FROM shared.dim_date dd
              CROSS JOIN
         (
             SELECT DISTINCT 
                    employee_key
             FROM wfms.fact_employee_balance
         ) feb
              CROSS JOIN wfms.dim_balance db
         WHERE dd.date BETWEEN DATEFROMPARTS(DATEPART(YY, GETDATE()) - 2, 12, 31) AND GETDATE())
     SELECT dd.*, 
            t.delta
     INTO wfms.test2
     FROM cte_date dd
          LEFT JOIN wfms.test1 t ON dd.balance_key = t.balance_key
                                  AND dd.employee_key = t.employee_key
                                  AND t.date_key =  (SELECT TOP 1 tt1.date_key
                                                    FROM    wfms.test1 tt1
                                                    WHERE   tt1.balance_key = t.balance_key
                                                    AND     tt1.employee_key = t.employee_key
                                                    AND     tt1.date_key < dd.date_key);

CodePudding user response:

Just for fun, I wanted to test an idea.

For the moment, lets assume the gaps are not too wide ... In this example 7 days.

On a relative to batch, the lag() over() approach was 22% while the Cross Apply was 78%.

Again, Just for fun

Select Employee
      ,Bank
      ,Date
      ,Delta = coalesce(A.Delta
                       ,lag(Delta,1) over (partition by Employee,Bank order by date)
                       ,lag(Delta,2) over (partition by Employee,Bank order by date)
                       ,lag(Delta,3) over (partition by Employee,Bank order by date)
                       ,lag(Delta,4) over (partition by Employee,Bank order by date)
                       ,lag(Delta,5) over (partition by Employee,Bank order by date)
                       ,lag(Delta,6) over (partition by Employee,Bank order by date)
                       ,lag(Delta,7) over (partition by Employee,Bank order by date)
                       )
 From  YourTable A

Versus

Select Employee
      ,Bank
      ,Date
      ,Delta = coalesce(A.Delta,B.Delta)
 From  YourTable A
 Cross Apply ( Select top 1 Delta 
                From  YourTable 
                Where Employee=A.Employee
                  and A.Bank = Bank
                  and Delta is not null
                  and A.Date>=Date
                 Order By Date desc
             ) B

Update

Same results with 20 days

CodePudding user response:

Here is another way. Using sum() with window function to find the group "Grp" of rows (1 row with not null with subsequent rows of null). Finally max(Delta) of the Grp to return the not null value.

select  Employee, Bank, [Date], max  (max(Delta)) 
                                over (partition by Employee, Bank, Grp)
from 
(
    select  *, Grp = sum  (case when Delta is not null then 1 else 0 end) 
                     over (partition by Employee,Bank 
                               order by [Date]) 
    from    YourTable
) t
group by Employee, Bank, [Date], Grp
  • Related