Home > Software engineering >  Only chose 'Online' if it is the latest date in the database
Only chose 'Online' if it is the latest date in the database

Time:08-04

ID Status Online Datestamp Other Datestamp
1 Online 2022-07-01
1 Offline NULL 2022-08-01
2 Online 2022-08-03
2 Unknown 2022-07-01
3 Online 2022-07-03
3 Online 2022-07-05
3 Unknown NULL 2022-06-05
4 Unknown NULL 2022-06-02
5 Online 2022-04-04
5 Online 2022-04-06

The output I'm looking for in SQL/BigQuery.

ID Status Online Datestamp
2 Online 2022-08-03
3 Online 2022-07-05
5 Online 2022-04-06

So what SQL query I would want to return is the rows for ID 2,3 & 5 as the newest status for that ID is "Online". I only care about getting the rows for IDs that have 'Online' as their newest row.

ID 1 should not be included even though they have one "online" status. As they have an offline status at a newer date.

Furthemore, ID 2 should be included as their online datestamp is newer than the unknown datestamp.

ID 3 should be included as Online is the latest timestamp for that ID. Which is the same for ID 5.

I am only looking for IDs which have Online as their latest status.

I'm finding it difficult as I need to compare between two columns, any thoughts would be appreciated.

But if it is easier, I wouldn't mind an output showing all columns if Online is the only Status, for example. (including all rows of ID 3 and 5).

ID Status Online Datestamp
2 Online 2022-08-03
3 Online 2022-07-03
3 Online 2022-07-05
5 Online 2022-04-04
5 Online 2022-04-06

CodePudding user response:

You can carry it out with a ROW_NUMBER window function inside a QUALIFY clause. In order to handle the two column dates, you can use the COALESCE function.

SELECT ID,
       Status,
       Online Datestamp
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY COALESCE(OnlineTimestamp, OtherTimestamp) DESC) = 1

CodePudding user response:

Consider below option

select * from your_table
where status != 'Unknown'
qualify countif(status != 'Online') over (partition by id)= 0   
and row_number() over (partition by id order by Online_Datestamp desc) = 1

if applied to sample data in your question - output is

enter image description here

But if it is easier, I wouldn't mind an output showing all columns if Online is the only Status,

Note: if you remove line with row_number - output will be

enter image description here

CodePudding user response:

Using HAVING along with ARRAY_AGG,

SELECT id, ARRAY_AGG(STRUCT(status, Online_Datestamp) ORDER BY Online_Datestamp DESC LIMIT 1)[SAFE_OFFSET(0)].*
  FROM sample_table, UNNEST([Online_Datestamp, Other_Datestamp]) Online_Datestamp
 GROUP BY id
HAVING status = 'Online';

You can get the result following:

enter image description here

Note: You may need type casting for datestamps like SAFE_CAST(Online_Datestamp AS DATE) depending on data in your real table.

  • Related