I have a dataset within a date range which has three columns, Product_type, date and metric. For a given product_type, data is not available for all days. For the missing rows, we would like to do a forward date fill for next n days using the last value of the metric.
Product_type | date | metric |
---|---|---|
A | 2019-10-01 | 10 |
A | 2019-10-02 | 12 |
A | 2019-10-03 | 15 |
A | 2019-10-04 | 5 |
A | 2019-10-05 | 5 |
A | 2019-10-06 | 5 |
A | 2019-10-16 | 12 |
A | 2019-10-17 | 23 |
A | 2019-10-18 | 34 |
Here, the data from 2019-10-04 to 2019-10-06, has been forward filled. There might be bigger gaps in the dates, but we only want to fill the first n days.
Here, n=2, so rows 5 and 6 has been forward filled.
I am not sure how to implement this logic in SQL.
CodePudding user response:
Here's one option. Read comments within code.
Sample data:
SQL> WITH
2 test (product_type, datum, metric)
3 AS
4 (SELECT 'A', DATE '2019-10-01', 10 FROM DUAL
5 UNION ALL
6 SELECT 'A', DATE '2019-10-02', 12 FROM DUAL
7 UNION ALL
8 SELECT 'A', DATE '2019-10-03', 15 FROM DUAL
9 UNION ALL
10 SELECT 'A', DATE '2019-10-04', 5 FROM DUAL
11 UNION ALL
12 SELECT 'A', DATE '2019-10-16', 12 FROM DUAL
13 UNION ALL
14 SELECT 'A', DATE '2019-10-18', 23 FROM DUAL),
Query begins here:
15 temp
16 AS
17 -- CB_FWD_FILL = 1 if difference between two consecutive dates is larger than 1 day
18 -- (i.e. that's the gap to be forward filled)
19 (SELECT product_type,
20 datum,
21 metric,
22 LEAD (datum) OVER (PARTITION BY product_type ORDER BY datum)
23 next_datum,
24 CASE
25 WHEN LEAD (datum)
26 OVER (PARTITION BY product_type ORDER BY datum)
27 - datum >
28 1
29 THEN
30 1
31 ELSE
32 0
33 END
34 cb_fwd_fill
35 FROM test)
36 -- original data from the table
37 SELECT product_type, datum, metric FROM test
38 UNION ALL
39 -- DATUM is the last date which is OK; add LEVEL pseudocolumn to it to fill the gap
40 -- with PAR_N number of rows
41 SELECT product_type, datum LEVEL, metric
42 FROM (SELECT product_type, datum, metric
43 FROM (-- RN = 1 means that that's the first gap in data set - that's the one
44 -- that has to be forward filled
45 SELECT product_type,
46 datum,
47 metric,
48 ROW_NUMBER ()
49 OVER (PARTITION BY product_type ORDER BY datum) rn
50 FROM temp
51 WHERE cb_fwd_fill = 1)
52 WHERE rn = 1)
53 CONNECT BY LEVEL <= &par_n
54 ORDER BY datum;
Result:
Enter value for par_n: 2
PRODUCT_TYPE DATUM METRIC
--------------- ---------- ----------
A 2019-10-01 10
A 2019-10-02 12
A 2019-10-03 15
A 2019-10-04 5
A 2019-10-05 5 --> newly added
A 2019-10-06 5 --> rows
A 2019-10-16 12
A 2019-10-18 23
8 rows selected.
SQL>
CodePudding user response:
Another solution:
WITH test (product_type, datum, metric) AS
(
SELECT 'A', DATE '2019-10-01', 10 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-02', 12 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-03', 15 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-04', 5 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-16', 12 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-18', 23 FROM DUAL
),
minmax(mindatum, maxdatum) AS (
SELECT MIN(datum), max(datum) from test
),
alldates (datum, product_type) AS
(
SELECT mindatum level - 1, t.product_type FROM minmax,
(select distinct product_type from test) t
connect by mindatum level <= (select maxdatum from minmax)
),
grouped as (
select a.datum, a.product_type, t.metric,
count(t.product_type) over(partition by a.product_type order by a.datum) as grp
from alldates a
left join test t on t.datum = a.datum
),
final_table as (
select g.datum, g.product_type, g.grp, g.rn,
last_value(g.metric ignore nulls) over(partition by g.product_type order by g.datum) as metric
from (
select g.*, row_number() over(partition by product_type, grp order by datum) - 1 as rn
from grouped g
) g
)
select datum, product_type, metric
from final_table
where rn <= &par_n
order by datum
;