SQL Server : Using LAG() with calculated previous value-CodePudding

I have data like this in SQL Server 2016 database table:

PERIODE	PERIODE_FORECAST	VALUE
2021-08-01	2021-01-01	51384.673
2021-08-01	2021-02-01	44118.129
2021-08-01	2021-03-01	43164.446
2021-08-01	2021-04-01	38113.745
2021-08-01	2021-05-01	37306.956
2021-08-01	2021-06-01	38390.359
2021-08-01	2021-07-01	42692.390
2021-08-01	2021-08-01	39814.047
2021-08-01	2021-09-01	0.000
2021-08-01	2021-10-01	0.000
2021-08-01	2021-11-01	0.000
2021-08-01	2021-12-01	0.000

I have a condition to fill out the zero values for example :

CASE 
    WHEN PERIODE_FORECAST > PERIODE 
        THEN [PREVIOUS_MONTH] * 0.1
    ELSE VALUE 
END

I am trying to use the LAG() function, but it didn't come up as I was hoped for.

Query :

SELECT
    PERIODE,PERIODE_FORECAST,
    CASE 
        WHEN PERIODE_FORECAST > PERIODE 
            THEN LAG(VALUE, 1, 0) OVER (ORDER BY PERIODE_FORECAST ASC) * 0.1 
        ELSE VALUE 
    END VALUE
FROM 
    MyTable

Result :

PERIODE	PERIODE_FORECAST	VALUE
2021-08-01	2021-01-01	51384.673
2021-08-01	2021-02-01	44118.129
2021-08-01	2021-03-01	43164.446
2021-08-01	2021-04-01	38113.745
2021-08-01	2021-05-01	37306.956
2021-08-01	2021-06-01	38390.359
2021-08-01	2021-07-01	42692.390
2021-08-01	2021-08-01	39814.047
2021-08-01	2021-09-01	3981.4047
2021-08-01	2021-10-01	0.000
2021-08-01	2021-11-01	0.000
2021-08-01	2021-12-01	0.000

Expected results:

| PERIODE       | PERIODE_FORECAST      | VALUE     |
| --------------|-----------------------|-----------|
| 2021-08-01    | 2021-01-01            | 51384.673 |
| 2021-08-01    | 2021-02-01            | 44118.129 |
| 2021-08-01    | 2021-03-01            | 43164.446 |
| 2021-08-01    | 2021-04-01            | 38113.745 |
| 2021-08-01    | 2021-05-01            | 37306.956 |
| 2021-08-01    | 2021-06-01            | 38390.359 |
| 2021-08-01    | 2021-07-01            | 42692.390 |
| 2021-08-01    | 2021-08-01            | 39814.047 |
| 2021-08-01    | 2021-09-01            | 3981.4047 |
| 2021-08-01    | 2021-10-01            | 398.14047 |
| 2021-08-01    | 2021-11-01            | 39.814047 |
| 2021-08-01    | 2021-12-01            | 3.9814047 |   
|---------------|-----------------------|-----------|

Is there any solution or workaround without creating stored procedure?

Thanks

CodePudding user response：

Your problem is that you lag the value 0 when you are 2 month after the first periode. I think that you have to use a parameter for the second part of the function lag(). Someting like : lag(VALUE, DATEDIFF(m,convert(date,'2021-08-01'),convert(date,'2021-10-01')) ,0 )

here :

Lag( Value, DATEDIFF(m,convert(date,PERIODE),convert(date,PERIODE_FORECAST)) ,0 )

You maybe have to add this datediff as exponent for the multiplicator 0.1.

CodePudding user response：

This was an interesting question, and my approach is likely not the most performant since I am a programmer, but it does produce the results you expect. I first create a common table expression and number the rows with ROW_NUMBER. Then I join back to the CTE to find the single VALUE with the highest ROW_NUMBER. You can this use the difference in ROW_NUMBERs as a POWER of 10.0000 to get the multiplier.

First I created your data:

DECLARE @Forecast AS TABLE
(
    [PERIODE]          DATE           NOT NULL,
    [PERIODE_FORECAST] DATE           NOT NULL,
    [VALUE]            DECIMAL(16, 8) NOT NULL
);

INSERT INTO @Forecast
(
    [PERIODE],
    [PERIODE_FORECAST],
    [VALUE]
)
VALUES
('2021-08-01', '2021-01-01', 51384.673),
('2021-08-01', '2021-02-01', 44118.129),
('2021-08-01', '2021-03-01', 43164.446),
('2021-08-01', '2021-04-01', 38113.745),
('2021-08-01', '2021-05-01', 37306.956),
('2021-08-01', '2021-06-01', 38390.359),
('2021-08-01', '2021-07-01', 42692.390),
('2021-08-01', '2021-08-01', 39814.047),
('2021-08-01', '2021-09-01', 0.000),
('2021-08-01', '2021-10-01', 0.000),
('2021-08-01', '2021-11-01', 0.000),
('2021-08-01', '2021-12-01', 0.000);

Once I had this, the query is:

;WITH [NumberedRows]
AS (SELECT [PERIODE],
           [PERIODE_FORECAST],
           [VALUE],
           ROW_NUMBER() OVER (PARTITION BY [PERIODE]
                              ORDER BY [PERIODE_FORECAST]
                             ) AS [rn]
    FROM   @Forecast)
SELECT [nr1].[PERIODE],
       [nr1].[PERIODE_FORECAST],
       CASE WHEN [sub].[VALUE] IS NOT NULL THEN
                [sub].[VALUE] * POWER(10.0000000, ([sub].[rn] - [nr1].[rn]))
            ELSE
                [nr1].[VALUE]
       END AS [VALUE]
FROM   [NumberedRows] AS [nr1]
       LEFT OUTER JOIN
       (
           SELECT   TOP(1)
                    [nr2].[PERIODE], [nr2].[PERIODE_FORECAST], [nr2].[VALUE], [nr2].[rn]
           FROM     [NumberedRows] [nr2]
           WHERE    [nr2].[VALUE] <> 0
           ORDER BY [nr2].[rn] DESC
       ) AS [sub]
           ON [sub].[PERIODE] = [nr1].[PERIODE]
              AND [sub].[rn] < [nr1].[rn];

And the results:

PERIODE	PERIODE_FORECAST	VALUE
2021-08-01	2021-01-01	51384.673000
2021-08-01	2021-02-01	44118.129000
2021-08-01	2021-03-01	43164.446000
2021-08-01	2021-04-01	38113.745000
2021-08-01	2021-05-01	37306.956000
2021-08-01	2021-06-01	38390.359000
2021-08-01	2021-07-01	42692.390000
2021-08-01	2021-08-01	39814.047000
2021-08-01	2021-09-01	3981.404700
2021-08-01	2021-10-01	398.140470
2021-08-01	2021-11-01	39.814047
2021-08-01	2021-12-01	3.981405

CodePudding user response：

This is gaps and island problem where each non-zero value marks the beginning of a new island. Once grouped you can use first_value to pick out the corresponding "previous value" while row_number serves as an offset for computing a multiplier.

with A as (
    select *,
      sum(case when "value" <> 0 then 1 else 0 end)
        over (order by periode_forecast) as grp
    from T
), B as (
    select *,
      first_value("value")
        over (partition by grp order by periode_forecast) as pv,
      power(10e, 1 - row_number()
        over (partition by grp order by periode_forecast)) as mult
    from A
)
select periode_forecast, "value", grp, pv, pv * mult as new_value
from B
order by periode_forecast;

You may want to stick with decimal math and avoid float. If so then adjust the 10e inside the power() reference.

If there is no valid prior non-zero row then the result will be zero. It's not clear whether that will happen in your data or how to treat it differently.

https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=545d6e84ea43885788e45a2fb0393884