How to undertake linear interpolation of null values in a SQL table?-CodePudding

I have a Spark SQL table in Databricks with missing values (null) that takes the following form:

date         price        stock       item_code
2020-01-01   17.99        6           01
2020-01-02   17.99        null        01
2020-01-03   17.99        4           01
2020-01-04   17.99        null        01
2020-01-01   11.59        null        02
2020-01-02   11.59        8           02
2020-01-03   11.59        null        02
2020-01-04   11.59        4           02

For each item_code, I have data across the same range of date. The stock column is subject to reporting errors and missing values. Sometimes these missing values occur at the start/end of the date range.

In the case where an item has values for the before and after period, I'd like to use a simple average of the recorded value before and after to interpolate the null. When the missing value is at the start or end of the date range, I'd like to fill the first or last available stock value for that item_code. My objective table is like this:

date         price        stock       item_code
2020-01-01   17.99        6           01
2020-01-02   17.99        5           01
2020-01-03   17.99        4           01
2020-01-04   17.99        4           01
2020-01-01   11.59        8           02
2020-01-02   11.59        8           02
2020-01-03   11.59        6           02
2020-01-04   11.59        4           02

I think I could use a window function to find the average of the before/after period, but I'm not sure how to build the window. Any advice would be appreciated.

CodePudding user response：

ANALYTICAL FUNCTION should help this out:

SELECT 
   date, 
   price,
   CASE
    WHEN stock Is NULL then AVG(stock) OVER( PARTITION BY item_code ORDER BY item_code, date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
    ELSE stock
   END AS stock, 
   item_code
FROM t
ORDER BY item_code, date

See DEMO