Suppose I have a table as follows:
msg.tracking_number | msg.subtag_Message | msg.checkpoint_time |
---|---|---|
12345678 | Delivered | 2022-05-12 |
12345678 | In Transit | 2022-05-11 |
12345678 | In Transit | 2022-05-10 |
12345678 | In Transit | 2022-05-09 |
12345678 | In Transit | 2022-05-08 |
12345678 | Info Received | 2022-05-07 |
87654321 | Delivered | 2022-05-06 |
87654321 | In Transit | 2022-05-05 |
87654321 | In Transit | 2022-05-04 |
87654321 | In Transit | 2022-05-03 |
87654321 | Info Received | 2022-05-02 |
87654321 | Info Received | 2022-05-01 |
In the middle, you see the tags. 'Info received' is always related to the earliest date for a given tracking number. The amount of info received is always variable. Mostly it is just one info received per tracking number, other times it is more than one.
What I want is the date, related to the 'In Transit' tag. Specifically, the earliest (MIN) date from that 'In Transit'.
I already tried the following:
SELECT
A.checkpoint_time,
msg.subtag_message,
msg.tracking_number,
Case
When
msg.subtag_message = 'In Transit'
Then A.checkpoint_time
End as test
FROM `reporting-01.aftership.aftership_data`, UNNEST(msg.checkpoints) as A
However, This gives me all the dates related to the In Transit tags. Again The amount of In Transit related to one tracking number are variable. Mostly 3 but can be more. I am thinking that I have to build a CASE within a CASE that only gives me that one, MIN, in transit date... but I have no clue on how to define this syntax...
The environment is a LOOKER SQL runner database, Original data comes from GBQ.
Any ideas?
CodePudding user response:
Apply rank to each checkpoint_time for each combination of tracking_number and subtag_Message and apply filter based on subtag_Message and the computed rank
WITH ranked_table AS (
SELECT *, RANK() OVER (PARTITION BY tracking_number, subtag_Message ORDER BY checkpoint_time) AS cp_rank
FROM table1
)
SELECT * EXCEPT(cp_rank)
FROM ranked_table
WHERE subtag_Message != "Info Received" OR cp_rank = 1
Output:
tracking_number | subtag_Message | checkpoint_time |
---|---|---|
12345678 | Delivered | 2022-05-12 |
12345678 | In Transit | 2022-05-11 |
12345678 | In Transit | 2022-05-10 |
12345678 | In Transit | 2022-05-09 |
12345678 | In Transit | 2022-05-08 |
12345678 | Info Received | 2022-05-07 |
87654321 | Delivered | 2022-05-06 |
87654321 | In Transit | 2022-05-05 |
87654321 | In Transit | 2022-05-04 |
87654321 | In Transit | 2022-05-03 |
87654321 | Info Received | 2022-05-01 |
CodePudding user response:
You should group by the tracking_number
and subtag_message
:
SELECT
msg.tracking_number,
msg.subtag_message,
MIN(A.checkpoint_time) AS checkpoint_time
FROM `reporting-01.aftership.aftership_data`, UNNEST(msg.checkpoints) as A
WHERE msg.subtag_message = 'In Transit'
GROUP BY msg.tracking_number, msg.subtag_message