Home > Net >  SQL Server : Using LAG() with calculated previous value
SQL Server : Using LAG() with calculated previous value

Time:11-09

I have data like this in SQL Server 2016 database table:

PERIODE PERIODE_FORECAST VALUE
2021-08-01 2021-01-01 51384.673
2021-08-01 2021-02-01 44118.129
2021-08-01 2021-03-01 43164.446
2021-08-01 2021-04-01 38113.745
2021-08-01 2021-05-01 37306.956
2021-08-01 2021-06-01 38390.359
2021-08-01 2021-07-01 42692.390
2021-08-01 2021-08-01 39814.047
2021-08-01 2021-09-01 0.000
2021-08-01 2021-10-01 0.000
2021-08-01 2021-11-01 0.000
2021-08-01 2021-12-01 0.000

I have a condition to fill out the zero values for example :

CASE 
    WHEN PERIODE_FORECAST > PERIODE 
        THEN [PREVIOUS_MONTH] * 0.1
    ELSE VALUE 
END

I am trying to use the LAG() function, but it didn't come up as I was hoped for.

Query :

SELECT
    PERIODE,PERIODE_FORECAST,
    CASE 
        WHEN PERIODE_FORECAST > PERIODE 
            THEN LAG(VALUE, 1, 0) OVER (ORDER BY PERIODE_FORECAST ASC) * 0.1 
        ELSE VALUE 
    END VALUE
FROM 
    MyTable

Result :

PERIODE PERIODE_FORECAST VALUE
2021-08-01 2021-01-01 51384.673
2021-08-01 2021-02-01 44118.129
2021-08-01 2021-03-01 43164.446
2021-08-01 2021-04-01 38113.745
2021-08-01 2021-05-01 37306.956
2021-08-01 2021-06-01 38390.359
2021-08-01 2021-07-01 42692.390
2021-08-01 2021-08-01 39814.047
2021-08-01 2021-09-01 3981.4047
2021-08-01 2021-10-01 0.000
2021-08-01 2021-11-01 0.000
2021-08-01 2021-12-01 0.000

Expected results:

| PERIODE       | PERIODE_FORECAST      | VALUE     |
| --------------|-----------------------|-----------|
| 2021-08-01    | 2021-01-01            | 51384.673 |
| 2021-08-01    | 2021-02-01            | 44118.129 |
| 2021-08-01    | 2021-03-01            | 43164.446 |
| 2021-08-01    | 2021-04-01            | 38113.745 |
| 2021-08-01    | 2021-05-01            | 37306.956 |
| 2021-08-01    | 2021-06-01            | 38390.359 |
| 2021-08-01    | 2021-07-01            | 42692.390 |
| 2021-08-01    | 2021-08-01            | 39814.047 |
| 2021-08-01    | 2021-09-01            | 3981.4047 |
| 2021-08-01    | 2021-10-01            | 398.14047 |
| 2021-08-01    | 2021-11-01            | 39.814047 |
| 2021-08-01    | 2021-12-01            | 3.9814047 |   
|---------------|-----------------------|-----------|

Is there any solution or workaround without creating stored procedure?

Thanks

CodePudding user response:

Your problem is that you lag the value 0 when you are 2 month after the first periode. I think that you have to use a parameter for the second part of the function lag(). Someting like : lag(VALUE, DATEDIFF(m,convert(date,'2021-08-01'),convert(date,'2021-10-01')) ,0 )

here :

Lag( Value, DATEDIFF(m,convert(date,PERIODE),convert(date,PERIODE_FORECAST)) ,0 )

You maybe have to add this datediff as exponent for the multiplicator 0.1.

CodePudding user response:

This was an interesting question, and my approach is likely not the most performant since I am a programmer, but it does produce the results you expect. I first create a common table expression and number the rows with ROW_NUMBER. Then I join back to the CTE to find the single VALUE with the highest ROW_NUMBER. You can this use the difference in ROW_NUMBERs as a POWER of 10.0000 to get the multiplier.

First I created your data:

DECLARE @Forecast AS TABLE
(
    [PERIODE]          DATE           NOT NULL,
    [PERIODE_FORECAST] DATE           NOT NULL,
    [VALUE]            DECIMAL(16, 8) NOT NULL
);

INSERT INTO @Forecast
(
    [PERIODE],
    [PERIODE_FORECAST],
    [VALUE]
)
VALUES
('2021-08-01', '2021-01-01', 51384.673),
('2021-08-01', '2021-02-01', 44118.129),
('2021-08-01', '2021-03-01', 43164.446),
('2021-08-01', '2021-04-01', 38113.745),
('2021-08-01', '2021-05-01', 37306.956),
('2021-08-01', '2021-06-01', 38390.359),
('2021-08-01', '2021-07-01', 42692.390),
('2021-08-01', '2021-08-01', 39814.047),
('2021-08-01', '2021-09-01', 0.000),
('2021-08-01', '2021-10-01', 0.000),
('2021-08-01', '2021-11-01', 0.000),
('2021-08-01', '2021-12-01', 0.000);

Once I had this, the query is:

;WITH [NumberedRows]
AS (SELECT [PERIODE],
           [PERIODE_FORECAST],
           [VALUE],
           ROW_NUMBER() OVER (PARTITION BY [PERIODE]
                              ORDER BY [PERIODE_FORECAST]
                             ) AS [rn]
    FROM   @Forecast)
SELECT [nr1].[PERIODE],
       [nr1].[PERIODE_FORECAST],
       CASE WHEN [sub].[VALUE] IS NOT NULL THEN
                [sub].[VALUE] * POWER(10.0000000, ([sub].[rn] - [nr1].[rn]))
            ELSE
                [nr1].[VALUE]
       END AS [VALUE]
FROM   [NumberedRows] AS [nr1]
       LEFT OUTER JOIN
       (
           SELECT   TOP(1)
                    [nr2].[PERIODE], [nr2].[PERIODE_FORECAST], [nr2].[VALUE], [nr2].[rn]
           FROM     [NumberedRows] [nr2]
           WHERE    [nr2].[VALUE] <> 0
           ORDER BY [nr2].[rn] DESC
       ) AS [sub]
           ON [sub].[PERIODE] = [nr1].[PERIODE]
              AND [sub].[rn] < [nr1].[rn];

And the results:

PERIODE PERIODE_FORECAST VALUE
2021-08-01 2021-01-01 51384.673000
2021-08-01 2021-02-01 44118.129000
2021-08-01 2021-03-01 43164.446000
2021-08-01 2021-04-01 38113.745000
2021-08-01 2021-05-01 37306.956000
2021-08-01 2021-06-01 38390.359000
2021-08-01 2021-07-01 42692.390000
2021-08-01 2021-08-01 39814.047000
2021-08-01 2021-09-01 3981.404700
2021-08-01 2021-10-01 398.140470
2021-08-01 2021-11-01 39.814047
2021-08-01 2021-12-01 3.981405

CodePudding user response:

This is gaps and island problem where each non-zero value marks the beginning of a new island. Once grouped you can use first_value to pick out the corresponding "previous value" while row_number serves as an offset for computing a multiplier.

with A as (
    select *,
      sum(case when "value" <> 0 then 1 else 0 end)
        over (order by periode_forecast) as grp
    from T
), B as (
    select *,
      first_value("value")
        over (partition by grp order by periode_forecast) as pv,
      power(10e, 1 - row_number()
        over (partition by grp order by periode_forecast)) as mult
    from A
)
select periode_forecast, "value", grp, pv, pv * mult as new_value
from B
order by periode_forecast;

You may want to stick with decimal math and avoid float. If so then adjust the 10e inside the power() reference.

If there is no valid prior non-zero row then the result will be zero. It's not clear whether that will happen in your data or how to treat it differently.

https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=545d6e84ea43885788e45a2fb0393884

  • Related