Home > Mobile >  Count Distinct IDs in a date range given a start and end time
Count Distinct IDs in a date range given a start and end time

Time:06-17

I have a BigQuery table like this

id start_date end_date location type
1 2022-01-01 2022-01-01 MO mobile
1 2022-01-01 2022-01-02 MO mobile
2 2022-01-02 2022-01-03 AZ laptop
3 2022-01-03 2022-01-03 AZ mobile
3 2022-01-03 2022-01-03 AZ mobile
3 2022-01-03 2022-01-03 AZ mobile
2 2022-01-02 2022-01-03 CA laptop
4 2022-01-02 2022-01-03 CA mobile
5 2022-01-02 2022-01-03 CA laptop

I want to return the number of unique IDs by location and type of an arbitrary date range.

The issue I have is that there are multiple repeating lines covering similar dates, like the first two rows above.

For example, a date range of 2022-01-02 and 2022-01-03 would return

location type count distinct ID
AZ laptop 1
AZ mobile 1
CA laptop 2
CA mobile 1
MO mobile 1

I first tried creating a list of dates in like

WITH dates AS (SELECT * FROM UNNEST(GENERATE_DATE_ARRAY(DATE_TRUNC(DATE_SUB(CURRENT_DATE('Europe/London'), INTERVAL 3 MONTH),MONTH), DATE_SUB(CURRENT_DATE('Europe/London'), INTERVAL 1 DAY), INTERVAL 1 DAY)) AS cal_date)

and using ROW_NUMBER() OVER (PARTITION BY id,start_date,end_date) to expose only unique rows.

But I was only able to return the number of unique IDs for each day, rather than looking at the full date range as a whole.

I then tried joining to the same cte to return a unique row for each date, so something like

date | id | start_date | end_date | location | type

Where the columns from the first table above are duplicated for each date but this would require generating a huge number of rows to then further work with.

What is the correct way to acheive the desired result?

CodePudding user response:

I think the simplest way is

select location, type, count(distinct id) distinct_ids
from your_table, unnest(generate_date_array(start_date, end_date)) effective_date
where effective_date between '2022-01-02' and '2022-01-03'
group by location, type           

with output (if applied to sample data in your question)

enter image description here

  • Related