How to write a Postgresql query to select the Max value row from a grouping of subrows-CodePudding

so this is what I am trying to do.

I have a Postgresql DB containing metadata information. Here are the demo schemas:

CREATE TABLE testfiles (
    dockey SERIAL PRIMARY KEY,
    filekey VARCHAR(255) NOT NULL,
    version_num INTEGER NOT NULL,
    version_id VARCHAR(255) NOT NULL,
    size INTEGER NOT NULL,
    modified_date VARCHAR(255) NOT NULL,
    modified_by VARCHAR(255) NOT NULL
)

The table is currently populated with this info:

# colnames=
['dockey', 'filekey', 'version_num', 'version_id', 'size', 'modified_date', 'modified_by']
-   (1, 'myDoc.txt', 1, '1a', 16, '08/09/2022', 'Jonathan')
-   (2, 'myDoc.txt', 12, '1a', 16, '08/09/2022', 'Jonathan')
-   (3, 'myDoc.txt', 13, '1b', 16, '08/08/2022', 'Jonathan')
-   (4, 'myDoc.txt', 14, '1c', 16, '08/09/2022', 'Jonathan')
-   (5, 'myDoc.txt', 15, '1d', 16, '08/08/2022', 'Jonathan')
-   (6, 'myDoc2.txt', 16, '1e', 16, '08/09/2022', 'Jonathan')
-   (7, 'myDoc2.txt', 17, '1f', 16, '08/08/2022', 'Jonathan')
-   (8, 'myDoc2.txt', 18, '1g', 16, '08/09/2022', 'Jonathan')
-   (9, 'myDoc2.txt', 19, '1h', 16, '08/08/2022', 'Jonathan')
-   (10, 'myDoc2.txt', 20, '1i', 16, '08/09/2022', 'Jonathan')

What I want to do is group the filekey's together, and then find the row with the Max versionNumber.

So an example correct output would be

-   (5, 'myDoc.txt', 15, '1d', 16, '08/08/2022', 'Jonathan')
-   (10, 'myDoc2.txt', 20, '1i', 16, '08/09/2022', 'Jonathan')

because it is grouping myDoc.txt & myDoc2.txt, then finding the largest version number (15 and 20) for each group.

Using normal SQL statements, I should be able to call:

"SELECT *, MAX(version_num) FROM testfiles GROUP BY filekey"

and it will do what I am wanting when tested on a sample SQL DB https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_min

The only thing I can think of is that the above SELECT statement will not work because I am running a Postgresql DB rather than a standard MySQL DB.

Lastly, this is the error I receive when trying to run the SELECT statement on our Postgresql DB:

"SELECT *, MAX(version_num) FROM testfiles GROUP BY filekey"


  "error": "column \"testfiles.dockey\" must appear in the GROUP BY clause or be
 used in an aggregate function
\nLINE 1: SELECT MAX(version_num), * FROM testfiles GROUP BY filekey\n                                 ^\n"

Thank you so much to anyone who can help!

CodePudding user response：

You can try using CTE. Below code might be useful for you..

with cte1 as (
    select *,
          row_number() over (partition by filekey order by dockey) asrn
    from testfiles
          )
          select *
          from cte1 
          where rn in (select max(rn) from cte1)
          order by dockey

CodePudding user response：

WITH B AS 
 (SELECT 
     filekey,            
     MAX(version_num) max_version FROM testfiles 
GROUP BY 
     filekey) 
SELECT
     A.* FROM testfiles A INNER JOIN B ON A.filekey=B.filekey AND A.version_num=B.max_version_num;