Home > OS >  SQL Implementing Forward Fill logic
SQL Implementing Forward Fill logic

Time:08-31

I have a dataset within a date range which has three columns, Product_type, date and metric. For a given product_type, data is not available for all days. For the missing rows, we would like to do a forward date fill for next n days using the last value of the metric.

Product_type date metric
A 2019-10-01 10
A 2019-10-02 12
A 2019-10-03 15
A 2019-10-04 5
A 2019-10-05 5
A 2019-10-06 5
A 2019-10-16 12
A 2019-10-17 23
A 2019-10-18 34

Here, the data from 2019-10-04 to 2019-10-06, has been forward filled. There might be bigger gaps in the dates, but we only want to fill the first n days.

Here, n=2, so rows 5 and 6 has been forward filled.

I am not sure how to implement this logic in SQL.

CodePudding user response:

Here's one option. Read comments within code.

Sample data:

SQL> WITH
  2     test (product_type, datum, metric)
  3     AS
  4        (SELECT 'A', DATE '2019-10-01', 10 FROM DUAL
  5         UNION ALL
  6         SELECT 'A', DATE '2019-10-02', 12 FROM DUAL
  7         UNION ALL
  8         SELECT 'A', DATE '2019-10-03', 15 FROM DUAL
  9         UNION ALL
 10         SELECT 'A', DATE '2019-10-04', 5 FROM DUAL
 11         UNION ALL
 12         SELECT 'A', DATE '2019-10-16', 12 FROM DUAL
 13         UNION ALL
 14         SELECT 'A', DATE '2019-10-18', 23 FROM DUAL),

Query begins here:

 15     temp
 16     AS
 17     -- CB_FWD_FILL = 1 if difference between two consecutive dates is larger than 1 day
 18     -- (i.e. that's the gap to be forward filled)
 19        (SELECT product_type,
 20                datum,
 21                metric,
 22                LEAD (datum) OVER (PARTITION BY product_type ORDER BY datum)
 23                   next_datum,
 24                CASE
 25                   WHEN   LEAD (datum)
 26                             OVER (PARTITION BY product_type ORDER BY datum)
 27                        - datum >
 28                        1
 29                   THEN
 30                      1
 31                   ELSE
 32                      0
 33                END
 34                   cb_fwd_fill
 35           FROM test)
 36      -- original data from the table
 37      SELECT product_type, datum, metric FROM test
 38      UNION ALL
 39      -- DATUM is the last date which is OK; add LEVEL pseudocolumn to it to fill the gap
 40      -- with PAR_N number of rows
 41      SELECT product_type, datum   LEVEL, metric
 42        FROM (SELECT product_type, datum, metric
 43                FROM (-- RN = 1 means that that's the first gap in data set - that's the one
 44                      -- that has to be forward filled
 45                      SELECT product_type,
 46                             datum,
 47                             metric,
 48                             ROW_NUMBER ()
 49                                OVER (PARTITION BY product_type ORDER BY datum) rn
 50                        FROM temp
 51                       WHERE cb_fwd_fill = 1)
 52               WHERE rn = 1)
 53  CONNECT BY LEVEL <= &par_n
 54    ORDER BY datum;

Result:

Enter value for par_n: 2

PRODUCT_TYPE    DATUM          METRIC
--------------- ---------- ----------
A               2019-10-01         10
A               2019-10-02         12
A               2019-10-03         15
A               2019-10-04          5
A               2019-10-05          5     --> newly added
A               2019-10-06          5     --> rows
A               2019-10-16         12
A               2019-10-18         23

8 rows selected.

SQL>

CodePudding user response:

Another solution:

WITH test (product_type, datum, metric)     AS
(
    SELECT 'A', DATE '2019-10-01', 10 FROM DUAL
    UNION ALL
    SELECT 'A', DATE '2019-10-02', 12 FROM DUAL
    UNION ALL
    SELECT 'A', DATE '2019-10-03', 15 FROM DUAL
    UNION ALL
    SELECT 'A', DATE '2019-10-04', 5 FROM DUAL
    UNION ALL
    SELECT 'A', DATE '2019-10-16', 12 FROM DUAL
    UNION ALL
    SELECT 'A', DATE '2019-10-18', 23 FROM DUAL
),
minmax(mindatum, maxdatum) AS (
    SELECT MIN(datum), max(datum) from test
),
alldates (datum, product_type) AS
(
    SELECT mindatum   level - 1, t.product_type FROM  minmax,
    (select distinct product_type from test) t
    connect by mindatum   level <= (select maxdatum from minmax)
),
grouped as (
    select a.datum, a.product_type, t.metric, 
        count(t.product_type) over(partition by a.product_type order by a.datum) as grp
    from alldates a
    left join test t on t.datum = a.datum
),
final_table as (
    select g.datum, g.product_type, g.grp, g.rn,
        last_value(g.metric ignore nulls) over(partition by g.product_type order by g.datum) as metric
    from (
        select g.*, row_number() over(partition by product_type, grp order by datum) - 1 as rn
        from grouped g
    ) g
)
select datum, product_type, metric 
from final_table
where rn <= &par_n
order by datum
;
  • Related