Home > Enterprise >  How do I cross join the same table with max timestamps per field?
How do I cross join the same table with max timestamps per field?

Time:11-06

Via SQL, I'm trying to get from this:

id group_id session_id field_label field_value sent_at
1 frosted flakes blue bowl first_name Bob 2022-11-05 18:18:19.093
2 frosted flakes blue bowl first_name Bobby 2022-11-05 18:17:31.274
3 frosted flakes blue bowl last_name Brown 2022-11-05 18:17:16.241
4 frosted flakes blue bowl last_name Browning 2022-11-05 18:15:34.492
5 frosted flakes blue bowl last_name Brownson 2022-11-05 18:14:58.465
6 cheerios green cup first_name Christine 2022-11-05 18:18:58.222
7 cheerios green cup last_name Christmas 2022-11-05 18:20:41.212
8 cheerios green cup last_name Christopherson 2022-11-05 18:24:58.222

where

  • id is unique
  • group_id is not unique
  • session_id is not unique

to this:

group_id session_id amalgamated_field
frosted flakes blue bowl Bob Brown
cheerios green cup Christine Christopherson

Where I know the field_labels that I want to amalgamate, and I want to get the latest value for each amalgamated field label based on sent_at grouped by group_id.

So for group frosted flakes, I want to get the most recent field_value associated with field_label first_name (Bob) and the most recent field_value associated with field_label last_name (Brown).

And repeat.

I tried a cross join and I also tried an inner join similar to this thread. But I keep getting all combinations :/

CodePudding user response:

You may use ROW_NUMBER function with conditional aggregation as the following:

SELECT group_id, session_id,
       CONCAT_WS(' ', 
                  MAX(CASE WHEN field_label='first_name' AND rn=1 THEN field_value END),
                  MAX(CASE WHEN field_label='last_name' AND rn=1 THEN field_value END)
                ) AS amalgamated_field
FROM
(
  SELECT *, 
    ROW_NUMBER() OVER (PARTITION BY group_id, session_id, field_label ORDER BY sent_at DESC) rn
  FROM table_name
) T
GROUP BY group_id, session_id
ORDER BY group_id

See a demo.

CodePudding user response:

In Postgres I would recommend distinct on and string aggregation:

select group_id, session_id, 
    string_agg(field_value, ' ' order by field_label) full_name
from (
    select distinct on (group_id, session_id, field_label) t.*
    from mytable t
    order by group_id, session_id, field_label, sent_at desc
) t
where field_label in ('first_name', 'last_name')
group by group_id, session_id

The distinct on subquery returns the latest row for each group/session/label tuple. In the outer query, we filter on the two labels we are interested in, and aggregate the field values for each group/session tuple. The order by clause of string_agg ensures that the first and last name are concatenated in sequence.

CodePudding user response:

You can first find the maximum timestamp for each field_label, and then perform a self-join of the original table back onto the maximum timestamp, aggregating the field_values for each group_id:

select t1.group_id, t2.session_id, string_agg(t2.field_value,' ') 
from (select t.group_id, t.field_label, max(t.sent_at) d 
   from tbl t group by t.group_id, t.field_label) t1 
join tbl t2 on t1.d = t2.sent_at group by t1.group_id, t2.session_id

See fiddle.

  • Related