I am attempting to understand the progression of my observations that are on time relative to when they were expected, regardless of the date they were expected. Therefore, I want to reindex each observation and generate a list that starts at day 0 (on the expected
day) and then calculate forward for 10 more days (arbitrary).
I am testing this in BigQuery:
CREATE TABLE `db.tbl` (
id INTEGER,
expected DATE,
actual DATE
)
INSERT INTO `db.tbl`
( id , expected , actual )
VALUES
( 1 , '2022-01-01' , '2022-01-02' ),
( 2 , '2022-01-11' , '2022-01-20' ),
( 3 , '2022-01-21' , '2022-01-20' )
So, the first row represents an observation that was "missing"/"late"/"not on time" on day 0 (2022-01-01
) and then "on time" from day 1 (2022-01-02
) until the end of my window of interest (day 10).
The second row represents an observation that was "late" from day 0 (2022-01-11
) to day 8 (2022-01-19
) and "on time" after that.
The third row represents an observation that was observed early, so it should be "on time" from day 0 through day 10.
I would want the result to be:
day count fraction
0 1 0.33
1 2 0.67
2 2 0.67
3 2 0.67
4 2 0.67
5 2 0.67
6 2 0.67
7 2 0.67
8 2 0.67
9 3 1.00
10 3 1.00
Is this possible with a SELECT
statement?
CodePudding user response:
CREATE TEMP TABLE sample (
id INTEGER,
expected DATE,
actual DATE
);
INSERT INTO sample
( id , expected , actual )
VALUES
( 1 , '2022-01-01' , '2022-01-02' ),
( 2 , '2022-01-11' , '2022-01-20' ),
( 3 , '2022-01-21' , '2022-01-20' );
WITH observations AS (
SELECT day, COUNTIF(v = '1') AS count, (SELECT COUNT(id) FROM sample) AS total
FROM sample,
UNNEST([IF(DATE_DIFF(actual, expected, DAY) < 0, 0, DATE_DIFF(actual, expected, DAY))]) diff,
UNNEST(SPLIT(REPEAT('0', diff) || REPEAT('1', 10 - diff), '')) v WITH OFFSET day
GROUP BY 1
)
SELECT day, count, ROUND(count / total, 2) AS fraction
FROM observations;
output:
CodePudding user response:
Consider below
select day, sum(ontime) cnt, round(avg(ontime),2) fraction
from (
select day, if(dt < actual, 0, 1) ontime
from your_table,
unnest(generate_array(0,10)) day
left join unnest(generate_date_array(expected, actual)) dt with offset as day
using(day)
)
group by day
if applied to sample data in your question
with your_table as (
select 1 id, date '2022-01-01' expected, date '2022-01-02' actual union all
select 2, '2022-01-11' , '2022-01-20' union all
select 3, '2022-01-21' , '2022-01-20'
)
output is