SQL: How to Query by Reindexing Dates to Calculate Generic Daily Prognosis-CodePudding

I am attempting to understand the progression of my observations that are on time relative to when they were expected, regardless of the date they were expected. Therefore, I want to reindex each observation and generate a list that starts at day 0 (on the expected day) and then calculate forward for 10 more days (arbitrary).

I am testing this in BigQuery:

CREATE TABLE `db.tbl` (
  id INTEGER,
  expected DATE,
  actual DATE
)

INSERT INTO `db.tbl`
( id , expected , actual )
VALUES
( 1 , '2022-01-01' , '2022-01-02' ),
( 2 , '2022-01-11' , '2022-01-20' ),
( 3 , '2022-01-21' , '2022-01-20' )

So, the first row represents an observation that was "missing"/"late"/"not on time" on day 0 (2022-01-01) and then "on time" from day 1 (2022-01-02) until the end of my window of interest (day 10).

The second row represents an observation that was "late" from day 0 (2022-01-11) to day 8 (2022-01-19) and "on time" after that.

The third row represents an observation that was observed early, so it should be "on time" from day 0 through day 10.

I would want the result to be:

day  count fraction
  0      1     0.33
  1      2     0.67
  2      2     0.67
  3      2     0.67
  4      2     0.67
  5      2     0.67
  6      2     0.67
  7      2     0.67
  8      2     0.67
  9      3     1.00
 10      3     1.00

Is this possible with a SELECT statement?

CodePudding user response：

CREATE TEMP TABLE sample (
  id INTEGER,
  expected DATE,
  actual DATE
);
INSERT INTO sample
( id , expected , actual )
VALUES
( 1 , '2022-01-01' , '2022-01-02' ),
( 2 , '2022-01-11' , '2022-01-20' ),
( 3 , '2022-01-21' , '2022-01-20' );


WITH observations AS (
  SELECT day, COUNTIF(v = '1') AS count, (SELECT COUNT(id) FROM sample) AS total
    FROM sample,
         UNNEST([IF(DATE_DIFF(actual, expected, DAY) < 0, 0, DATE_DIFF(actual, expected, DAY))]) diff,
         UNNEST(SPLIT(REPEAT('0', diff) || REPEAT('1', 10 - diff), '')) v WITH OFFSET day
   GROUP BY 1
)
SELECT day, count, ROUND(count / total, 2) AS fraction
  FROM observations;

output:

CodePudding user response：

Consider below

select day, sum(ontime) cnt, round(avg(ontime),2) fraction
from (
  select day, if(dt < actual, 0, 1) ontime
  from your_table, 
  unnest(generate_array(0,10)) day
  left join unnest(generate_date_array(expected, actual)) dt with offset as day 
  using(day)
)
group by day

if applied to sample data in your question

with your_table as (
  select 1 id, date '2022-01-01' expected, date '2022-01-02' actual union all
  select 2, '2022-01-11' , '2022-01-20' union all
  select 3, '2022-01-21' , '2022-01-20' 
)

output is