Home > Software engineering >  SQL SELECT with JOIN and PARTITION
SQL SELECT with JOIN and PARTITION

Time:08-03

in my project I got 2 tables (postgresql) - partners and campaigns. Each partner may have many campaigns, but each campaign may have only one partner. The data looks like the following:

partners:
id | name
=========
1    Partner_1
2    Partner_2    


campaigns:
id | name   | partner_id | started_at  
=============================================================
1    camp_1    1            2022-07-06 00:00:00.000000  00:00
2    camp_2    1            2022-07-07 00:00:00.000000  00:00  
3    camp_3    2            2022-08-08 00:00:00.000000  00:00
4    camp_4    2            2022-08-09 00:00:00.000000  00:00 

The task is for each partner select only 1 campaign with the most recent started_at date. So the final result would look like the following:

partner_name | campaign_started_at
==================================
Partner_1      2022-07-07 00:00:00.000000  00:00
Partner_2      2022-08-09 00:00:00.000000  00:00

For these purposes I guess I need to use a window function with select subquery and Group By statements and I tried the following:

SELECT public.partners.name, 
       public.campaigns.started_at as camp_started_at, 
       rank() OVER (PARTITION BY public.partners.name ORDER BY public.campaigns.started_at DESC)
FROM public.partners 
  JOIN campaigns ON campaigns.partner_id = partners.id 
GROUP BY partners.name, public.campaigns.started_at;

but it did not produce the desired result. Any ideas how to fix it would be welcome.

CodePudding user response:

On Postgres, we don't even need to use RANK or ROW_NUMBER for this, we can just use DISTINCT ON:

SELECT DISTINCT ON (p.id) p.name AS partner_name,
                          c.started_at AS campaign_started_at
FROM partners p
INNER JOIN campaigns c ON c.partner_id = p.id
ORDER BY p.id, c.started_at DESC;

CodePudding user response:

You can simply use MAX() and GROUP BY partners.name to get your desired output.

SELECT public.partners.name
     , MAX(public.campaigns.started_at) as camp_started_at 
  FROM public.partners 
  JOIN campaigns ON campaigns.partner_id = partners.id 
 GROUP BY partners.name
;
  • Related