Home > Back-end >  Rolling count daily active drivers since beginning
Rolling count daily active drivers since beginning

Time:05-31

I have a list of drivers, orders, and dates in a table named all_data between 2022-01-01 and 2022-01-15 (15 days) like this:

driver_id order_id order_date
1 a 2022-01-01
1 b 2022-01-02
2 c 2022-01-01
2 d 2022-01-03

For all 15 days, how do I find the number of continually active drivers, who completed at least one order every single day, up to that date? The output should be a table like this:

order_date active_drivers
2022-01-01 30
2022-01-02 27
2022-01-03 25

For example, on 2022-01-01, there were 30 unique drivers who completed at least one order that day. On 2022-01-02, we must find the number of unique drivers who completed at least one order on 2022-01-01 and 2022-01-02. On 2022-01-03, we must count drivers who completed at least one order on 2022-01-01, 2022-01-02, and 2022-01-03.

What I have tried

I found a similar solution in MySQL (below) but it is not allowed in bigquery because of the error "Unsupported subquery with table in join predicate".

MySQL
SELECT order_date, 
 (SELECT COUNT(distinct s1.driver_id) as num_hackers 
   FROM all_data s2 
   join all_data s1 
   on s2. order_date = s1. order_date and     
     (SELECT COUNT(distinct s3. order_date) 
      FROM all_data s3 
      WHERE s3.driver_id = s2.driver_id 
        AND s3. order_date < s1. order_date) 
     = datediff(s1. order_date, date('2022-01-01'), day)
))
from all_data 

I also read this Google BigQuery: Rolling Count Distinct question but that is for a fixed 45 number of days, while the number of days here is a variable based on the date. How do I write a query in BigQuerySQL to find the rolling number of continually active drivers per day?

CodePudding user response:

First find out all combination of dates and drivers, then just get count of all drivers per date. Try this:

select order_date, count(*)
from(
    select order_date, driver_id, count(*)
    from all_data ad
    group by order_date, driver_id)
group by order_date

CodePudding user response:

Consider below

select order_date, count(distinct if(flag, driver_id, null)) active_drivers
from (
  select order_date, driver_id, 
    row_number() over(partition by driver_id order by order_date) - 
    date_diff(order_date, min(order_date) over(), day) = 1 as flag
  from (select distinct order_date, driver_id from all_data)
)
group by order_date
  • Related