Home > OS >  SQL first date for the first specific occurrence of a value
SQL first date for the first specific occurrence of a value

Time:05-11

Suppose I have a table as follows:

msg.tracking_number msg.subtag_Message msg.checkpoint_time
12345678 Delivered 2022-05-12
12345678 In Transit 2022-05-11
12345678 In Transit 2022-05-10
12345678 In Transit 2022-05-09
12345678 In Transit 2022-05-08
12345678 Info Received 2022-05-07
87654321 Delivered 2022-05-06
87654321 In Transit 2022-05-05
87654321 In Transit 2022-05-04
87654321 In Transit 2022-05-03
87654321 Info Received 2022-05-02
87654321 Info Received 2022-05-01

In the middle, you see the tags. 'Info received' is always related to the earliest date for a given tracking number. The amount of info received is always variable. Mostly it is just one info received per tracking number, other times it is more than one.

What I want is the date, related to the 'In Transit' tag. Specifically, the earliest (MIN) date from that 'In Transit'.

I already tried the following:

    SELECT  
    A.checkpoint_time,
    msg.subtag_message,
    msg.tracking_number,
    
    Case 
    When
    msg.subtag_message = 'In Transit'
    Then A.checkpoint_time
    End as test
    
    FROM `reporting-01.aftership.aftership_data`, UNNEST(msg.checkpoints) as A

However, This gives me all the dates related to the In Transit tags. Again The amount of In Transit related to one tracking number are variable. Mostly 3 but can be more. I am thinking that I have to build a CASE within a CASE that only gives me that one, MIN, in transit date... but I have no clue on how to define this syntax...

The environment is a LOOKER SQL runner database, Original data comes from GBQ.

Any ideas?

CodePudding user response:

Apply rank to each checkpoint_time for each combination of tracking_number and subtag_Message and apply filter based on subtag_Message and the computed rank

WITH ranked_table AS (
  SELECT *, RANK() OVER (PARTITION BY tracking_number, subtag_Message ORDER BY checkpoint_time) AS cp_rank
  FROM table1
)

SELECT * EXCEPT(cp_rank)
FROM ranked_table
WHERE subtag_Message != "Info Received" OR cp_rank = 1
Output:
tracking_number subtag_Message checkpoint_time
12345678 Delivered 2022-05-12
12345678 In Transit 2022-05-11
12345678 In Transit 2022-05-10
12345678 In Transit 2022-05-09
12345678 In Transit 2022-05-08
12345678 Info Received 2022-05-07
87654321 Delivered 2022-05-06
87654321 In Transit 2022-05-05
87654321 In Transit 2022-05-04
87654321 In Transit 2022-05-03
87654321 Info Received 2022-05-01

CodePudding user response:

You should group by the tracking_number and subtag_message:

SELECT
    msg.tracking_number,
    msg.subtag_message,
    MIN(A.checkpoint_time) AS checkpoint_time
FROM `reporting-01.aftership.aftership_data`, UNNEST(msg.checkpoints) as A
WHERE msg.subtag_message = 'In Transit'
GROUP BY msg.tracking_number, msg.subtag_message
  • Related