Home > Software design >  How to aggregate using distinct values across two columns?
How to aggregate using distinct values across two columns?

Time:10-05

I have the following data in an orders table:

revenue       expenses        location_1      location_2
3             6               London          New York
6             11              Paris           Toronto
1             8               Houston         Sydney
1             4               Chicago         Los Angeles
2             5               New York        London
7             11              New York        Boston
4             6               Toronto         Paris
5             11              Toronto         New York
1             2               Los Angeles     London
0             0               Mexico City     London

I would like to create a result set that has 3 columns:

  1. a list of the 10 DISTINCT city names
  2. the sum of revenue for each city
  3. the sum of expenses for each city

The desired result is:

location         revenue    expenses
London           6          13
New York         17         33
Paris            10         17
Toronto          15         28
Houston          1          8
Sydney           1          8
Chicago          1          4
Los Angeles      2          6
Boston           7          11
Mexico City      0          0

Is it possible to aggregate on distinct values across two columns? If yes, how would I do it?

Here is a fiddle:
http://sqlfiddle.com/#!9/0b1105/1

CodePudding user response:

Shorter (and often faster):

SELECT location, sum(revenue) AS rev, sum(expenses) AS exp
FROM  (
   SELECT location_1 AS location, revenue, expenses FROM orders
   UNION ALL
   SELECT location_2            , revenue, expenses FROM orders
   ) sub
GROUP  BY 1;

May be faster:

WITH cte AS (
   SELECT location_1, location_2, revenue AS rev, expenses AS exp
   FROM orders
   )
SELECT location, sum(rev) AS rev, sum(exp) AS exp
FROM  (
   SELECT location_1 AS location, rev, exp FROM cte
   UNION ALL
   SELECT location_2            , rev, exp FROM cte
   ) sub
GROUP  BY 1;

The (materialized!) CTE adds overhead, which may outweigh the benefit. Depends on many factors like total table size, available indexes, possible bloat, available RAM, storage speed, Postgres version, ...

fiddle

CodePudding user response:

You could UNION ALL two queries and then select from it...

select location, sum(rev) as rev, sum(exp) as exp
from (
  select location_1 as location, sum(revenue) as rev, sum(expenses) as exp
  from orders
  group by location_1
  union all
  select location_2 as location, sum(revenue) as rev, sum(expenses) as exp
  from orders
  group by location_2
)z
group by location
order by 1
  • Related