I want to fill the NULL values with the last given value for that column. A small sample of the data:
2021-08-15 Bulgaria 1081636
2021-08-16 Bulgaria 1084693
2021-08-17 Bulgaria 1089066
2021-08-18 Bulgaria NULL
2021-08-19 Bulgaria NULL
In this example, the NULL values should be 1089066 until I reach the next non-NULL value.
I tried the answer given in this response, but to no avail. Any help would be appreciated, thank you!
EDIT: Sorry, I got sidetracked with trying to return the last value that I forgot my ultimate goal, which is to replace the NULL values with the previous known value.
Therefore the query should be
UPDATE covid_data
SET people_vaccinated = ISNULL(?)
CodePudding user response:
Assuming the number you have is always increasing, you can use MAX aggregate over a window:
SELECT dt
, country
, cnt
, MAX(cnt) OVER (PARTITION BY country ORDER BY dt)
FROM #data
If the number may decrease, the query becomes a little bit more complex as we need to mark the rows that have nulls as belonging to the same group as the last one without a null first:
SELECT dt
, country
, cnt
, SUM(cnt) OVER (PARTITION BY country, partition)
FROM (
SELECT country
, dt
, cnt
, SUM(CASE WHEN cnt IS NULL THEN 0 ELSE 1 END) OVER (PARTITION BY country ORDER BY dt) AS partition
FROM #data
) AS d
ORDER BY dt
Here's a working demo on dbfiddle, it returns the same data with ever increasing amount, but if you change the number for 08-17 to be lower than that of 08-16, you'll see MAX(...) method producing wrong results.
CodePudding user response:
In many datasets it is incorrect to make assumptions about the behaviour of the data in the underlying dataset, if your goal is simply to fill the blanks that might appear mid-way in a dataset then the answer to the post you referenced A:sql server nulls duplicate last known value in table is still one of the best solutions, here is an adaptation:
SELECT dt
, country
, cnt
, ISNULL(source.cnt, excludeNulls.LastCnt)
FROM #data source
OUTER APPLY ( SELECT TOP 1 cnt as LastCnt
FROM #data
WHERE dt < source.dt
AND cnt IS NOT NULL
ORDER BY dt desc) ExcludeNulls
ORDER BY dt
MAX
and LAST_VALUE
will give you the a value with respect to the entire record set, which would not work with the existing solutions if you had a value for 2021-08-19
. In that case the last value would be used to fill the gaps, not the previous non-null value.
When we need to fill in gaps that occur part-way through the results we need to apply a filter to the window query, TOP 1 ... ORDER BY
gives us the ability to filter and sort on entirely different fields to the one that we want to capture, but also means that we can display the last value for fields that are not numeric, see this fiddle a few other examples: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=372285d29f97dbb9663e8552af6fb7a2