Home > OS >  Count number of null rows along with the first non null row and reset the count after that
Count number of null rows along with the first non null row and reset the count after that

Time:11-09

HOUR Account_id media_id impressions
2022-11-04 04:00:00 UTC 256789 35 null
2022-11-04 05:00:00 UTC 256789 35 null
2022-11-04 06:00:00 UTC 256789 35 null
2022-11-04 07:00:00 UTC 256789 35 null
2022-11-04 08:00:00 UTC 256789 35 40
2022-11-04 09:00:00 UTC 256789 35 7
2022-11-04 10:00:00 UTC 256789 35 null
2022-11-04 11:00:00 UTC 256789 35 10
2022-11-04 12:00:00 UTC 256789 35 12

What we are trying to do is that when the impressions is count is null for an hour, then we take the value from the impressions where it is not null and then split the number evenly across the previously consecutive null rows and the first non null row.

If we take the row where the impressions count is 40 in the above 4 rows the impressions is null so including the row where the impressions is 40 makes the count as 5, then we divide 40 by 5 hence each hour gets 8 impressions.

The same above logic can be applied to the row where the impressions count is 10. It is distributed between 2 rows evenly hence in the output it is 5 impressions for each hour.

Here HOUR column is an increment of one hour with no gaps in between.

The query looks like this:

select *,
      case when impressions is null then row_number() over(partition by media_id,ACCOUNT_ID ORDER BY HOUR) else 0 end as rn1,
from table_name order by 1 ;

How I take it from there?

Expected Output:

HOUR Account_id media_id impressions distributed_impressions
2022-11-04 04:00:00 UTC 256789 35 null 8
2022-11-04 05:00:00 UTC 256789 35 null 8
2022-11-04 06:00:00 UTC 256789 35 null 8
2022-11-04 07:00:00 UTC 256789 35 null 8
2022-11-04 08:00:00 UTC 256789 35 40 8
2022-11-04 09:00:00 UTC 256789 35 7 7
2022-11-04 10:00:00 UTC 256789 35 null 5
2022-11-04 11:00:00 UTC 256789 35 10 5
2022-11-04 12:00:00 UTC 256789 35 12 12

CodePudding user response:

Consider below query.

SELECT * EXCEPT(part),
       MAX(impressions) OVER w1 / COUNT(*) OVER W1 AS distributed_impressions 
  FROM (
    SELECT *, COUNT(*) OVER w0 - COUNTIF(impressions IS NULL) OVER w0 AS part
      FROM sample_table
    WINDOW w0 AS (PARTITION BY Account_id, media_id ORDER BY HOUR ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
  ) WINDOW w1 AS (PARTITION BY Account_id, media_id, part);

Query results

enter image description here

How each island is identified

enter image description here

CodePudding user response:

I have a cursor solution for this...

declare  @T table (hr datetime, Account_id int, media_id int,   impressions decimal(5,2))

insert into @t
values
('2022-11-04 04:00:00'  ,256789 ,35 ,null)
,('2022-11-04 05:00:00' ,256789 ,35 ,null)
,('2022-11-04 06:00:00' ,256789 ,35 ,null)
,('2022-11-04 07:00:00' ,256789 ,35 ,null)
,('2022-11-04 08:00:00' ,256789 ,35 ,40)
,('2022-11-04 09:00:00' ,256789 ,35 ,7)
,('2022-11-04 10:00:00' ,256789 ,35 ,null)
,('2022-11-04 11:00:00' ,256789 ,35 ,10)
,('2022-11-04 12:00:00' ,256789 ,35 ,12)

declare @prevHr datetime
    ,@currHr datetime
    ,@imp int
    ,@AvgImp decimal(5,2)
    ,@GapCt int

declare csr cursor 
for
select hr,impressions
from @t
where impressions is not null
order by hr

open csr

fetch next from csr into @currHr,@imp

--Handle the starting point
select @GapCt = count(*)
from @t
where hr <= @currHr

set @AvgImp = @imp/@GapCt

update @t
set impressions = @AvgImp
where hr <= @currHr

set @prevHr = @currHr

fetch next from csr into @currHr,@imp

while @@FETCH_STATUS=0
BEGIN
    select @GapCt = count(*)
    from @t
    where hr <= @currHr
        and hr > @prevHr

    set @AvgImp = @imp/@GapCt

    update @t
    set impressions = @AvgImp
    where hr <= @currHr
        and hr > @prevHr

    set @prevHr = @currHr

    fetch next from csr into @currHr,@imp
END

close csr
deallocate csr

select * from @t

Results:

hr  Account_id  media_id    impressions
2022-11-04 04:00:00.000 256789  35  8.00
2022-11-04 05:00:00.000 256789  35  8.00
2022-11-04 06:00:00.000 256789  35  8.00
2022-11-04 07:00:00.000 256789  35  8.00
2022-11-04 08:00:00.000 256789  35  8.00
2022-11-04 09:00:00.000 256789  35  7.00
2022-11-04 10:00:00.000 256789  35  5.00
2022-11-04 11:00:00.000 256789  35  5.00
2022-11-04 12:00:00.000 256789  35  12.00
  • Related