I have a BigQuery table that keeps track of user sign ins:
user_id | user_name | user_sign_in_date |
---|---|---|
1 | john doe | 2019-04-05 |
1 | john doe | 2019-04-06 |
2 | bob bobson | 2019-04-05 |
2 | bob bobson | 2019-04-08 |
3 | jane beer | 2019-04-05 |
3 | jane beer | 2019-04-06 |
3 | jane beer | 2019-04-07 |
3 | jane beer | 2019-04-08 |
4 | amy face | 2019-04-09 |
I want to be able to get users who have not signed in on certain dates and list the dates they have missed. For example, if I want to see users who have not signed in between 2019-04-05 and 2019-04-08, I would need a SQL query that could help me generate something like:
[
{
'user_id': 1,
'user_name': 'john doe',
'dates_not_signed_in': ['2019-04-07', '2019-04-08']
},
{
'user_id': 2,
'user_name': 'bob bobson',
'dates_not_signed_in': ['2019-04-06', '2019-04-07']
},
{
'user_id': 4,
'user_name': 'amy face',
'dates_not_signed_in': ['2019-04-05', '2019-04-06', '2019-04-07', '2019-04-08']
}
]
I think I need to do something like
SELECT user_id
FROM table
to get all the user IDs but then I'm not sure how to surround that query so that I only get back users who didn't sign in on certain dates with the dates they missed.
CodePudding user response:
You can use generate_date_array
to enumerate the dates, and then check whether each user signed in at each date.
In normal usage, you would have a separate table to store the users, like users(user_id, user_name)
. I would also assume that the sign-in dates have a time portion. So:
select u.user_id, u.user_name, array_agg(dt order by dt) dates_not_signed_in
from users u
cross join unnest(generate_date_array('2019-04-05', '2019-04-08', interval 1 day)) as dt
where not exists (
select 1
from sign_ins s
where s.user_sign_in_date >= dt and s.s.user_sign_in_date < dt interval 1 day
)
group by u.user_id, u.user_name
CodePudding user response:
Consider also below approach
select * from (
select user_id, user_name,
array_to_string(array(
select cast(day as string) from unnest(generate_date_array('2019-04-05', '2019-04-08')) day
except distinct select cast(day as string) from unnest(arr) day
), ', ') as dates_not_signed_in
from (
select user_id, user_name, array_agg(user_sign_in_date) arr
from your_table
group by user_id, user_name
)
)
where dates_not_signed_in != ''
if applied to sample data in your question - output is