I have a table that records all the different statuses for a list of Jobs with timestamps. So the ID column has many Ids that appear several times as their status changes such as with the 'xyz' job below that went through several status changes.
JobId | Status | Timestamp |
---|---|---|
xyz | pending | 1:00 |
xyz | reviewed | 1:02 |
xyz | cancelled | 1:04 |
abc | pending | 4:30 |
abc | active | 5:30 |
I want to add on a 4th column called CurrentStatus that will show the latest status for each Id, so in the end the table would look like this.
JobId | Status | Timestamp | CurrrentStatus |
---|---|---|---|
xyz | pending | 1:00 | cancelled |
xyz | reviewed | 1:02 | cancelled |
xyz | cancelled | 1:04 | cancelled |
abc | pending | 4:30 | active |
abc | active | 5:30 | active |
So far I wrote the following query to create a LatestTimeStamp column that prints out the latest timestamp partitioned by each Id.
SELECT *, MAX(Timestamp) OVER(PARTITION BY JobId) AS LatestTimeStamp
FROM `data.sample`
JobId | Status | Timestamp | LatestTimeStamp |
---|---|---|---|
xyz | pending | 1:00 | 1:04 |
xyz | reviewed | 1:02 | 1:04 |
xyz | cancelled | 1:04 | 1:04 |
abc | pending | 4:30 | 5:30 |
abc | active | 5:30 | 5:30 |
It produced the table above but now how do I use that info to grab the actual status instead of the timestamp? or is there a more efficient method to produce a column with the current status for each id, maybe using correlated queries or CASE statements?
I can't figure out a query that works so any help would be appreciated.
CodePudding user response:
Use below
select *,
first_value(Status) over(partition by JobId order by Timestamp desc) as CurrrentStatus
from your_table
if applied to sample data in your question - output is