Home > Back-end >  Aggregate multiple detail rows in a join query
Aggregate multiple detail rows in a join query

Time:08-24

I have a couple of tables where I'm applying inner join to fetch matching rows:

event (id, url, name, eventStart, eventEnd)`  
slot (id, event_id date, start_time, end_time,
    location_type, location_id)` 

Query:

select * from event as evnt 
inner join slot as slot on evnt.id = slot.event_id
where location_type = 'type of location' and slot.location_id = '12345'
and slot.event_id = :eventId

This works fine. But sometimes a single event may have multiple slots. The result would be multiple rows for the same event. All column values in slot are the same for the same event, except date, start_time, end_time. I tried aggregating slot values like below, and this works fine:

select event_id,
    array_agg(concat(date,':',start_time,':',end_time)) as slotDates
from slot
where event_id = 'event id' and location_id = '12345' 
group by event_id 

When I join to the main query, it gives me an error:

select event_id,
    array_agg(concat(date,':',start_time,':',end_time)) as slotDates,
    evnt.id as eventId, evnt.url as eventUrl,
    evnt.name as eventName, evnt.event_start as eventStart,
    evnt.event_end as eventEnd, slot.location_type as locationType  
from event as evnt 
inner join slot as slot on evnt.id = slot.event_id
where slot.location_type = 'type of location'
and slot.location_id = '12345'
and slot.event_id = 'event id goes here'
group by slot.event_id

ERROR: column "evnt.id" must appear in the GROUP BY clause or be used in an aggregate function

How to fix this?

CodePudding user response:

The error occurred because many fields you used in the SELECT clause did not appear in the GROUP BY clause. It is required because if a group has more than one row, it will not know which to keep and which to drop. One easy solution is to put everything in the GROUP BY clause. But I am totally against it. You can get relevant information by conducting a self-join.

-- Modify the SELECT clause to get what you want
select *
from
(
    select event_id, array_agg(concat(date,':',start_time,':',end_time)) as slotDates 
    from slot 
    -- where event_id = 'event id' and location_id = '12345'
    -- Warning that WHERE clause will be processed before GROUP BY
    -- Consider add/remove the WHERE clause here based on your requirement
    group by event_id 
) as slotdates
join event on slotdates.event_id = event.id
join slot on event.id = slot.event_id
where event_id = 'event id' and location_id = '12345'

CodePudding user response:

Schema (PostgreSQL v14)

CREATE TABLE event(
id integer,
url text,
name text, 
event_start  date, 
event_end   date
);

CREATE TABLE slot(
id integer,
event_id integer,
date date,
start_time  date, 
end_time   date,
location_type text, 
location_id integer
);

INSERT INTO event(id, url, name, event_start, event_end) VALUES(1, 'URL', 'NAME', CURRENT_DATE, CURRENT_DATE);

INSERT INTO slot(id, event_id,date, end_time, start_time, location_type, location_id) VALUES(1, 1, CURRENT_DATE   1, CURRENT_DATE   2, CURRENT_DATE   3, 'type of location', 12345);

INSERT INTO slot(id, event_id,date, end_time, start_time, location_type, location_id) VALUES(2, 1, CURRENT_DATE   4, CURRENT_DATE   5, CURRENT_DATE   6, 'type of location', 12345);

Query

select 
    evnt.id as eventId, 
    evnt.url as eventUrl, 
    evnt.name as eventName, 
    evnt.event_start as eventStart, 
    evnt.event_end as eventEnd,
    agg_slot.slotDates, 
    agg_slot.location_type
from event as evnt,
     (select event_id, 
             location_type,
             location_id,
             array_agg(concat(slot.date,':',slot.start_time,':',slot.end_time)) as slotDates 
       from slot as slot       
       group by event_id, location_id, location_type ) as agg_slot
where evnt.id = agg_slot.event_id and evnt.id = 1
and agg_slot.location_type = 'type of location' and agg_slot.location_id = '12345';
eventid eventurl eventname eventstart eventend slotdates location_type
1 URL NAME 2022-08-22T00:00:00.000Z 2022-08-22T00:00:00.000Z 2022-08-23:2022-08-25:2022-08-24,2022-08-26:2022-08-28:2022-08-27 type of location

View on DB Fiddle

Or in case you need to get only one single aggregated column

select 
    evnt.id as eventId, 
    evnt.url as eventUrl, 
    evnt.name as eventName, 
    evnt.event_start as eventStart, 
    evnt.event_end as eventEnd,    
    (select array_agg(concat(slot.date,':',slot.start_time,':',slot.end_time))
       from slot as slot
       where slot.event_id = evnt.id and  slot.location_id = '12345' and slot.location_type = 'type of location'
       group by event_id) as slotdates    
from event as evnt
where  evnt.id = 1;
eventid eventurl eventname eventstart eventend slotdates
1 URL NAME 2022-08-22T00:00:00.000Z 2022-08-22T00:00:00.000Z 2022-08-23:2022-08-25:2022-08-24,2022-08-26:2022-08-28:2022-08-27

CodePudding user response:

I suggest a LATERAL subquery:

SELECT *
FROM   event e
LEFT   JOIN LATERAL (
   SELECT min(s.location_type) AS locationtype
        , array_agg(concat_ws(':', date, start_time, end_time)) AS slotdates
   FROM   slot s
   WHERE  s.location_type = 'type of location'  -- slot filters here
   AND    s.location_id = 12345
   AND    s.event_id = e.id
   ) s ON true
WHERE  e.id = 1;  -- event id goes here

db<>fiddle here

This always returns the event if its id exists. If there are no qualifying slots, locationtype and slotdates are NULL. That's typically more practical.

If you don't want the event without slots, use CROOS JOIN instead of LEFT JOIN (and drop ON true). See:

Either way, the point is to join after aggregating details, so we have no GROUP BY in the outer SELECT to begin with, thereby avoiding your original problem - in cheaper fashion.

  • Related