Home > Back-end >  Function to REPLACE* last previous known value for NULL
Function to REPLACE* last previous known value for NULL

Time:02-25

I want to fill the NULL values with the last given value for that column. A small sample of the data:

2021-08-15  Bulgaria    1081636
2021-08-16  Bulgaria    1084693
2021-08-17  Bulgaria    1089066
2021-08-18  Bulgaria    NULL
2021-08-19  Bulgaria    NULL

In this example, the NULL values should be 1089066 until I reach the next non-NULL value.

I tried the answer given in this response, but to no avail. Any help would be appreciated, thank you!

EDIT: Sorry, I got sidetracked with trying to return the last value that I forgot my ultimate goal, which is to replace the NULL values with the previous known value.

Therefore the query should be

UPDATE covid_data

SET people_vaccinated = ISNULL(?)

CodePudding user response:

Assuming the number you have is always increasing, you can use MAX aggregate over a window:

SELECT dt
     , country
     , cnt
     , MAX(cnt) OVER (PARTITION BY country ORDER BY dt)
  FROM #data

If the number may decrease, the query becomes a little bit more complex as we need to mark the rows that have nulls as belonging to the same group as the last one without a null first:

SELECT dt
     , country
     , cnt
     , SUM(cnt) OVER (PARTITION BY country, partition)
FROM (
    SELECT country
         , dt
         , cnt
         , SUM(CASE WHEN cnt IS NULL THEN 0 ELSE 1 END) OVER (PARTITION BY country ORDER BY dt) AS partition
    FROM #data
) AS d
ORDER BY dt

Here's a working demo on dbfiddle, it returns the same data with ever increasing amount, but if you change the number for 08-17 to be lower than that of 08-16, you'll see MAX(...) method producing wrong results.

CodePudding user response:

In many datasets it is incorrect to make assumptions about the behaviour of the data in the underlying dataset, if your goal is simply to fill the blanks that might appear mid-way in a dataset then the answer to the post you referenced A:sql server nulls duplicate last known value in table is still one of the best solutions, here is an adaptation:

SELECT dt
     , country
     , cnt
     , ISNULL(source.cnt, excludeNulls.LastCnt)
  FROM #data source
  OUTER APPLY ( SELECT TOP 1 cnt as LastCnt
                FROM  #data 
                WHERE dt < source.dt
                AND cnt IS NOT NULL
                ORDER BY dt desc) ExcludeNulls
 ORDER BY dt

MAX and LAST_VALUE will give you the a value with respect to the entire record set, which would not work with the existing solutions if you had a value for 2021-08-19. In that case the last value would be used to fill the gaps, not the previous non-null value.

When we need to fill in gaps that occur part-way through the results we need to apply a filter to the window query, TOP 1 ... ORDER BY gives us the ability to filter and sort on entirely different fields to the one that we want to capture, but also means that we can display the last value for fields that are not numeric, see this fiddle a few other examples: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=372285d29f97dbb9663e8552af6fb7a2

  • Related