HOUR | Account_id | media_id | impressions |
---|---|---|---|
2022-11-04 04:00:00 UTC | 256789 | 35 | null |
2022-11-04 05:00:00 UTC | 256789 | 35 | null |
2022-11-04 06:00:00 UTC | 256789 | 35 | null |
2022-11-04 07:00:00 UTC | 256789 | 35 | null |
2022-11-04 08:00:00 UTC | 256789 | 35 | 40 |
2022-11-04 09:00:00 UTC | 256789 | 35 | 7 |
2022-11-04 10:00:00 UTC | 256789 | 35 | null |
2022-11-04 11:00:00 UTC | 256789 | 35 | 10 |
2022-11-04 12:00:00 UTC | 256789 | 35 | 12 |
What we are trying to do is that when the impressions is count is null for an hour, then we take the value from the impressions where it is not null and then split the number evenly across the previously consecutive null rows and the first non null row.
If we take the row where the impressions count is 40
in the above 4 rows the impressions is null so including the row where the impressions is 40
makes the count as 5
, then we divide 40
by 5
hence each hour gets 8
impressions.
The same above logic can be applied to the row where the impressions count is 10
.
It is distributed between 2
rows evenly hence in the output it is 5
impressions for each hour.
Here HOUR
column is an increment of one hour with no gaps in between.
The query looks like this:
select *,
case when impressions is null then row_number() over(partition by media_id,ACCOUNT_ID ORDER BY HOUR) else 0 end as rn1,
from table_name order by 1 ;
How I take it from there?
Expected Output:
HOUR | Account_id | media_id | impressions | distributed_impressions |
---|---|---|---|---|
2022-11-04 04:00:00 UTC | 256789 | 35 | null | 8 |
2022-11-04 05:00:00 UTC | 256789 | 35 | null | 8 |
2022-11-04 06:00:00 UTC | 256789 | 35 | null | 8 |
2022-11-04 07:00:00 UTC | 256789 | 35 | null | 8 |
2022-11-04 08:00:00 UTC | 256789 | 35 | 40 | 8 |
2022-11-04 09:00:00 UTC | 256789 | 35 | 7 | 7 |
2022-11-04 10:00:00 UTC | 256789 | 35 | null | 5 |
2022-11-04 11:00:00 UTC | 256789 | 35 | 10 | 5 |
2022-11-04 12:00:00 UTC | 256789 | 35 | 12 | 12 |
CodePudding user response:
Consider below query.
SELECT * EXCEPT(part),
MAX(impressions) OVER w1 / COUNT(*) OVER W1 AS distributed_impressions
FROM (
SELECT *, COUNT(*) OVER w0 - COUNTIF(impressions IS NULL) OVER w0 AS part
FROM sample_table
WINDOW w0 AS (PARTITION BY Account_id, media_id ORDER BY HOUR ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) WINDOW w1 AS (PARTITION BY Account_id, media_id, part);
Query results
How each island is identified
CodePudding user response:
I have a cursor solution for this...
declare @T table (hr datetime, Account_id int, media_id int, impressions decimal(5,2))
insert into @t
values
('2022-11-04 04:00:00' ,256789 ,35 ,null)
,('2022-11-04 05:00:00' ,256789 ,35 ,null)
,('2022-11-04 06:00:00' ,256789 ,35 ,null)
,('2022-11-04 07:00:00' ,256789 ,35 ,null)
,('2022-11-04 08:00:00' ,256789 ,35 ,40)
,('2022-11-04 09:00:00' ,256789 ,35 ,7)
,('2022-11-04 10:00:00' ,256789 ,35 ,null)
,('2022-11-04 11:00:00' ,256789 ,35 ,10)
,('2022-11-04 12:00:00' ,256789 ,35 ,12)
declare @prevHr datetime
,@currHr datetime
,@imp int
,@AvgImp decimal(5,2)
,@GapCt int
declare csr cursor
for
select hr,impressions
from @t
where impressions is not null
order by hr
open csr
fetch next from csr into @currHr,@imp
--Handle the starting point
select @GapCt = count(*)
from @t
where hr <= @currHr
set @AvgImp = @imp/@GapCt
update @t
set impressions = @AvgImp
where hr <= @currHr
set @prevHr = @currHr
fetch next from csr into @currHr,@imp
while @@FETCH_STATUS=0
BEGIN
select @GapCt = count(*)
from @t
where hr <= @currHr
and hr > @prevHr
set @AvgImp = @imp/@GapCt
update @t
set impressions = @AvgImp
where hr <= @currHr
and hr > @prevHr
set @prevHr = @currHr
fetch next from csr into @currHr,@imp
END
close csr
deallocate csr
select * from @t
Results:
hr Account_id media_id impressions
2022-11-04 04:00:00.000 256789 35 8.00
2022-11-04 05:00:00.000 256789 35 8.00
2022-11-04 06:00:00.000 256789 35 8.00
2022-11-04 07:00:00.000 256789 35 8.00
2022-11-04 08:00:00.000 256789 35 8.00
2022-11-04 09:00:00.000 256789 35 7.00
2022-11-04 10:00:00.000 256789 35 5.00
2022-11-04 11:00:00.000 256789 35 5.00
2022-11-04 12:00:00.000 256789 35 12.00