I have a list of drivers, orders, and dates in a table named all_data
between 2022-01-01 and 2022-01-15 (15 days) like this:
driver_id | order_id | order_date |
---|---|---|
1 | a | 2022-01-01 |
1 | b | 2022-01-02 |
2 | c | 2022-01-01 |
2 | d | 2022-01-03 |
For all 15 days, how do I find the number of continually active drivers, who completed at least one order every single day, up to that date? The output should be a table like this:
order_date | active_drivers |
---|---|
2022-01-01 | 30 |
2022-01-02 | 27 |
2022-01-03 | 25 |
For example, on 2022-01-01, there were 30 unique drivers who completed at least one order that day. On 2022-01-02, we must find the number of unique drivers who completed at least one order on 2022-01-01 and 2022-01-02. On 2022-01-03, we must count drivers who completed at least one order on 2022-01-01, 2022-01-02, and 2022-01-03.
What I have tried
I found a similar solution in MySQL (below) but it is not allowed in bigquery because of the error "Unsupported subquery with table in join predicate".
MySQL
SELECT order_date,
(SELECT COUNT(distinct s1.driver_id) as num_hackers
FROM all_data s2
join all_data s1
on s2. order_date = s1. order_date and
(SELECT COUNT(distinct s3. order_date)
FROM all_data s3
WHERE s3.driver_id = s2.driver_id
AND s3. order_date < s1. order_date)
= datediff(s1. order_date, date('2022-01-01'), day)
))
from all_data
I also read this Google BigQuery: Rolling Count Distinct question but that is for a fixed 45 number of days, while the number of days here is a variable based on the date. How do I write a query in BigQuerySQL to find the rolling number of continually active drivers per day?
CodePudding user response:
First find out all combination of dates and drivers, then just get count of all drivers per date. Try this:
select order_date, count(*)
from(
select order_date, driver_id, count(*)
from all_data ad
group by order_date, driver_id)
group by order_date
CodePudding user response:
Consider below
select order_date, count(distinct if(flag, driver_id, null)) active_drivers
from (
select order_date, driver_id,
row_number() over(partition by driver_id order by order_date) -
date_diff(order_date, min(order_date) over(), day) = 1 as flag
from (select distinct order_date, driver_id from all_data)
)
group by order_date