I have a table that has a list of subscribers who have a phone under contract as of the end of each month. I need to calculate how many unique contract phones the subscriber had from the beginning as of the end of each month in BigQuery.
Sample Dataset:
Month ID | Phone Model | Sub ID | Contract Start Date |
---|---|---|---|
May 2022 | X1 | S1 | 2022-04-01 |
Apr 2022 | X1 | S1 | 2022-04-01 |
Mar 2022 | X2 | S1 | 2022-02-01 |
Feb 2022 | X2 | S1 | 2022-02-01 |
May 2022 | X3 | S2 | 2022-04-01 |
Apr 2022 | X3 | S2 | 2022-04-01 |
Mar 2022 | X3 | S2 | 2022-03-01 |
Feb 2022 | X4 | S2 | 2022-02-01 |
Desired output:
Month ID | Sub ID | count of distinct phones as of month end |
---|---|---|
May 2022 | S1 | 2 |
Apr 2022 | S1 | 2 |
Mar 2022 | S1 | 1 |
Feb 2022 | S1 | 1 |
May 2022 | S2 | 2 |
Apr 2022 | S2 | 2 |
Mar 2022 | S2 | 2 |
Feb 2022 | S2 | 1 |
Please not that in case of sub ID - S2, there are 3 contract start dates which ideally should mean 3 devices but there are only 2 devices. As a result the count for May & Apr 2022 for S2 should be 2 instead of 3
CodePudding user response:
Consider below approach
select
format_date('%B %Y', month) as month_id, sub_id,
count(distinct phone_model) as count_of_distinct_phones_as_of_month_end
from your_table,
unnest(generate_date_array(
date_trunc(contract_start_date, month),
date_trunc(current_date(), month),
interval 1 month
)) month
group by month_id, sub_id
order by sub_id, parse_date('%B %Y', month_id) desc
if applied to sample data in your question - output is
CodePudding user response:
You can try it with a window function, I guess.
DECLARE sample ARRAY<STRUCT<month_id STRING, phone_model STRING, sub_id STRING, contract_start_date DATE>>
DEFAULT [
('May 2022', 'X1', 'S1', DATE '2022-04-01'),
('Apr 2022', 'X1', 'S1', '2022-04-01'),
('Mar 2022', 'X2', 'S1', '2022-02-01'),
('Feb 2022', 'X2', 'S1', '2022-02-01'),
('May 2022', 'X3', 'S2', '2022-04-01'),
('Apr 2022', 'X3', 'S2', '2022-04-01'),
('Mar 2022', 'X3', 'S2', '2022-03-01'),
('Feb 2022', 'X4', 'S2', '2022-02-01')
];
SELECT month_id, sub_id,
(SELECT COUNT(DISTINCT p) FROM UNNEST(phones) p) AS count_of_distinct_phones_as_of_month_end
FROM (
SELECT *, ARRAY_AGG(phone_model) OVER (PARTITION BY sub_id ORDER BY PARSE_DATE('%B %Y', month_id)) AS phones
FROM UNNEST(sample)
);
output will be: