Home > Back-end >  How to use DISTINCT ON but ORDER BY another expression?
How to use DISTINCT ON but ORDER BY another expression?

Time:11-24

The model Subscription has_many SubscriptionCart.

A SubscriptionCart has a status and an authorized_at date.

I need to pick the cart with the oldest authorized_at date from all the carts associated to a Subscription, and then I have to order all the returned Subscription results by this subscription_carts.authorized_at column.

The query below is working but I can't figure out how to select DISTINCT ON subscription.id to avoid duplicates but ORDER BY subscription_carts.authorized_at .

raw sql query so far:

select distinct on (s.id) s.id as subscription_id, subscription_carts.authorized_at, s.*
from subscriptions s
join subscription_carts subscription_carts on subscription_carts.subscription_id = s.id 
and subscription_carts.plan_id = s.plan_id
where subscription_carts.status = 'processed'
and s.status IN ('authorized','in_trial', 'paused')
order by s.id, subscription_carts.authorized_at

If I try to ORDER BY subscription_carts.authorized_at first, I get an error because the DISTINCT ON and ORDER BY expressions must be in the same order.

The solutions I've found seem too complicated for what I need and I've failed to implement them because I don't understand them fully.

Would it be better to GROUP BY subscription_id and then pick from that group instead of using DISTINCT ON? Any help appreciated.

CodePudding user response:

This requirement is necessary to make DISTINCT ON work; to change the final order, you can add an outer query with another ORDER BY clause:

SELECT *
FROM (SELECT DISTINCT ON (s.id)
             s.id as subscription_id, subscription_carts.authorized_at, s.*
      FROM subscriptions s
         JOIN ...
      WHERE ...
      ORDER BY s.id, subscription_carts.authorized_at
     ) AS subq
ORDER BY authorized_at;

CodePudding user response:

You don't have to use DISTINCT ON. While it is occasionally useful, I personally find window function based approaches much more clear:

-- Optionally, list all columns explicitly, to remove the rn column again
SELECT *
FROM (
  SELECT
    s.id AS subscription_id,
    c.authorized_at,
    s.*,
    ROW_NUMBER () OVER (PARTITION BY s.id ORDER BY c.authorized_at) rn
  FROM subscriptions s
  JOIN subscription_carts c
  ON c.subscription_id = s.id
  AND c.plan_id = s.plan_id
  WHERE c.status = 'processed'
  AND s.status IN ('authorized', 'in_trial', 'paused')
) t
WHERE rn = 1
ORDER BY subscription_id, authorized_at
  • Related