Home > front end >  How to aggregate a column based on values of other columns - PostgreSQL
How to aggregate a column based on values of other columns - PostgreSQL

Time:07-16

Say I have a table:

date fruit status numberOfFruits
2022-01 apple ripe 3
2022-01 banana mature 5
2022-01 pear ripe 10
2022-01 grapes mature 9
2022-02 apple ripe 3
2022-02 banana mature 3
2022-02 pear ripe 3
2022-02 grapes mature 7

I want to be able to create a query that adds 3 aggregate columns (AVG, MIN, MAX) that aggregates the column numberOfFruits based on their status and date, ordered by fruit and date. The output of the table should be:

date fruit status numberOfFruits AvgNumOfFruits MaxNumOfFruits MinNumOfFruits
2022-01 apple ripe 3 6.5 10 3
2022-01 pear ripe 10 6.5 10 3
2022-01 banana mature 5 7 9 5
2022-01 grapes mature 9 7 9 5
2022-02 apple ripe 3 3 3 3
2022-02 pear ripe 3 3 3 3
2022-02 banana mature 3 5 7 3
2022-02 grapes mature 7 5 7 3

I'm at a loss and here's what I have so far:

SELECT 
    date, 
    fruit, 
    status,
    numberOfFruits,
    AVG(CASE WHEN "status" = 'ripe' THEN "numberOfFruits" ELSE "numberOfFruits" END) as AvgNumOfFruits, 
    MIN(CASE WHEN "status" = 'ripe' THEN "numberOfFruits" ELSE "numberOfFruits" END) as MingNumOfFruits,
    MAX(CASE WHEN "status" = 'ripe' THEN "numberOfFruits" ELSE "numberOfFruits" END) as MaxNumOfFruits
FROM fruitdata
GROUP BY 1, 2, 3, 4
ORDER BY date, status

The query just repeats the numberofFruits for those 3 aggregate columns. It should be like "take the average of the column numberOfFruits that has the status of ripe, as well as the date 2022-01, and put it in a new column called AvgNumOfFruits. Somehow I can't translate this to SQL.

Any advice/tips/help would be appreciated. Thanks in advance!

CodePudding user response:

Aggregation functions are employed to aggregate rows (combining rows to get aggregated values). In your case you should rather use window functions, that compute values over windows (partitions/groups of rows), though without aggregating the rows.

SELECT *,
       AVG(numberOfFruits) OVER(PARTITION BY date, status) AS AvgNumOfFruits, 
       MAX(numberOfFruits) OVER(PARTITION BY date, status) AS MaxNumOfFruits,
       MIN(numberOfFruits) OVER(PARTITION BY date, status) AS MinNumOfFruits 
FROM fruitdata
ORDER BY date,
         status DESC

The corresponding window functions need to be computed on groups of "date" and "status" fields. On the other hand, the ordering is completely optional and up to you (ORDER BY clause).

Check the demo here.

  • Related