in my project I got 2 tables (postgresql) - partners and campaigns. Each partner may have many campaigns, but each campaign may have only one partner. The data looks like the following:
partners:
id | name
=========
1 Partner_1
2 Partner_2
campaigns:
id | name | partner_id | started_at
=============================================================
1 camp_1 1 2022-07-06 00:00:00.000000 00:00
2 camp_2 1 2022-07-07 00:00:00.000000 00:00
3 camp_3 2 2022-08-08 00:00:00.000000 00:00
4 camp_4 2 2022-08-09 00:00:00.000000 00:00
The task is for each partner select only 1 campaign with the most recent started_at date. So the final result would look like the following:
partner_name | campaign_started_at
==================================
Partner_1 2022-07-07 00:00:00.000000 00:00
Partner_2 2022-08-09 00:00:00.000000 00:00
For these purposes I guess I need to use a window function with select subquery and Group By statements and I tried the following:
SELECT public.partners.name,
public.campaigns.started_at as camp_started_at,
rank() OVER (PARTITION BY public.partners.name ORDER BY public.campaigns.started_at DESC)
FROM public.partners
JOIN campaigns ON campaigns.partner_id = partners.id
GROUP BY partners.name, public.campaigns.started_at;
but it did not produce the desired result. Any ideas how to fix it would be welcome.
CodePudding user response:
On Postgres, we don't even need to use RANK
or ROW_NUMBER
for this, we can just use DISTINCT ON
:
SELECT DISTINCT ON (p.id) p.name AS partner_name,
c.started_at AS campaign_started_at
FROM partners p
INNER JOIN campaigns c ON c.partner_id = p.id
ORDER BY p.id, c.started_at DESC;
CodePudding user response:
You can simply use MAX()
and GROUP BY partners.name
to get your desired output.
SELECT public.partners.name
, MAX(public.campaigns.started_at) as camp_started_at
FROM public.partners
JOIN campaigns ON campaigns.partner_id = partners.id
GROUP BY partners.name
;