ID | Status | Online Datestamp | Other Datestamp |
---|---|---|---|
1 | Online | 2022-07-01 | |
1 | Offline | NULL | 2022-08-01 |
2 | Online | 2022-08-03 | |
2 | Unknown | 2022-07-01 | |
3 | Online | 2022-07-03 | |
3 | Online | 2022-07-05 | |
3 | Unknown | NULL | 2022-06-05 |
4 | Unknown | NULL | 2022-06-02 |
5 | Online | 2022-04-04 | |
5 | Online | 2022-04-06 |
The output I'm looking for in SQL/BigQuery.
ID | Status | Online Datestamp |
---|---|---|
2 | Online | 2022-08-03 |
3 | Online | 2022-07-05 |
5 | Online | 2022-04-06 |
So what SQL query I would want to return is the rows for ID 2,3 & 5 as the newest status for that ID is "Online". I only care about getting the rows for IDs that have 'Online' as their newest row.
ID 1 should not be included even though they have one "online" status. As they have an offline status at a newer date.
Furthemore, ID 2 should be included as their online datestamp is newer than the unknown datestamp.
ID 3 should be included as Online is the latest timestamp for that ID. Which is the same for ID 5.
I am only looking for IDs which have Online as their latest status.
I'm finding it difficult as I need to compare between two columns, any thoughts would be appreciated.
But if it is easier, I wouldn't mind an output showing all columns if Online is the only Status, for example. (including all rows of ID 3 and 5).
ID | Status | Online Datestamp |
---|---|---|
2 | Online | 2022-08-03 |
3 | Online | 2022-07-03 |
3 | Online | 2022-07-05 |
5 | Online | 2022-04-04 |
5 | Online | 2022-04-06 |
CodePudding user response:
You can carry it out with a ROW_NUMBER
window function inside a QUALIFY
clause. In order to handle the two column dates, you can use the COALESCE
function.
SELECT ID,
Status,
Online Datestamp
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY COALESCE(OnlineTimestamp, OtherTimestamp) DESC) = 1
CodePudding user response:
Consider below option
select * from your_table
where status != 'Unknown'
qualify countif(status != 'Online') over (partition by id)= 0
and row_number() over (partition by id order by Online_Datestamp desc) = 1
if applied to sample data in your question - output is
But if it is easier, I wouldn't mind an output showing all columns if Online is the only Status,
Note: if you remove line with row_number - output will be
CodePudding user response:
Using HAVING along with ARRAY_AGG,
SELECT id, ARRAY_AGG(STRUCT(status, Online_Datestamp) ORDER BY Online_Datestamp DESC LIMIT 1)[SAFE_OFFSET(0)].*
FROM sample_table, UNNEST([Online_Datestamp, Other_Datestamp]) Online_Datestamp
GROUP BY id
HAVING status = 'Online';
You can get the result following:
Note: You may need type casting for datestamps like SAFE_CAST(Online_Datestamp AS DATE) depending on data in your real table.