Home > Enterprise >  Get multiple missing dates from bigquery sql query
Get multiple missing dates from bigquery sql query

Time:11-05

I have a BigQuery table that keeps track of user sign ins:

user_id user_name user_sign_in_date
1 john doe 2019-04-05
1 john doe 2019-04-06
2 bob bobson 2019-04-05
2 bob bobson 2019-04-08
3 jane beer 2019-04-05
3 jane beer 2019-04-06
3 jane beer 2019-04-07
3 jane beer 2019-04-08
4 amy face 2019-04-09

I want to be able to get users who have not signed in on certain dates and list the dates they have missed. For example, if I want to see users who have not signed in between 2019-04-05 and 2019-04-08, I would need a SQL query that could help me generate something like:

[
  {
    'user_id': 1,
    'user_name': 'john doe',
    'dates_not_signed_in': ['2019-04-07', '2019-04-08']
  },
  {
    'user_id': 2,
    'user_name': 'bob bobson',
    'dates_not_signed_in': ['2019-04-06', '2019-04-07']
  },
  {
    'user_id': 4,
    'user_name': 'amy face',
    'dates_not_signed_in': ['2019-04-05', '2019-04-06', '2019-04-07', '2019-04-08']
  }
]

I think I need to do something like

SELECT user_id
FROM table

to get all the user IDs but then I'm not sure how to surround that query so that I only get back users who didn't sign in on certain dates with the dates they missed.

CodePudding user response:

You can use generate_date_array to enumerate the dates, and then check whether each user signed in at each date.

In normal usage, you would have a separate table to store the users, like users(user_id, user_name). I would also assume that the sign-in dates have a time portion. So:

select u.user_id, u.user_name, array_agg(dt order by dt) dates_not_signed_in
from users u
cross join unnest(generate_date_array('2019-04-05', '2019-04-08', interval 1 day)) as dt
where not exists (
    select 1 
    from sign_ins s 
    where s.user_sign_in_date >= dt and s.s.user_sign_in_date < dt   interval 1 day
)
group by u.user_id, u.user_name

CodePudding user response:

Consider also below approach

select * from (
  select user_id, user_name,
    array_to_string(array(
      select cast(day as string) from unnest(generate_date_array('2019-04-05', '2019-04-08')) day
      except distinct select cast(day as string) from unnest(arr) day
    ), ', ') as dates_not_signed_in
  from (
    select user_id, user_name, array_agg(user_sign_in_date) arr
    from your_table
    group by user_id, user_name
  )
)
where dates_not_signed_in != ''            

if applied to sample data in your question - output is

enter image description here

  • Related