so this is what I am trying to do.
I have a Postgresql DB containing metadata information. Here are the demo schemas:
CREATE TABLE testfiles (
dockey SERIAL PRIMARY KEY,
filekey VARCHAR(255) NOT NULL,
version_num INTEGER NOT NULL,
version_id VARCHAR(255) NOT NULL,
size INTEGER NOT NULL,
modified_date VARCHAR(255) NOT NULL,
modified_by VARCHAR(255) NOT NULL
)
The table is currently populated with this info:
# colnames=
['dockey', 'filekey', 'version_num', 'version_id', 'size', 'modified_date', 'modified_by']
- (1, 'myDoc.txt', 1, '1a', 16, '08/09/2022', 'Jonathan')
- (2, 'myDoc.txt', 12, '1a', 16, '08/09/2022', 'Jonathan')
- (3, 'myDoc.txt', 13, '1b', 16, '08/08/2022', 'Jonathan')
- (4, 'myDoc.txt', 14, '1c', 16, '08/09/2022', 'Jonathan')
- (5, 'myDoc.txt', 15, '1d', 16, '08/08/2022', 'Jonathan')
- (6, 'myDoc2.txt', 16, '1e', 16, '08/09/2022', 'Jonathan')
- (7, 'myDoc2.txt', 17, '1f', 16, '08/08/2022', 'Jonathan')
- (8, 'myDoc2.txt', 18, '1g', 16, '08/09/2022', 'Jonathan')
- (9, 'myDoc2.txt', 19, '1h', 16, '08/08/2022', 'Jonathan')
- (10, 'myDoc2.txt', 20, '1i', 16, '08/09/2022', 'Jonathan')
What I want to do is group the filekey's together, and then find the row with the Max versionNumber.
So an example correct output would be
- (5, 'myDoc.txt', 15, '1d', 16, '08/08/2022', 'Jonathan')
- (10, 'myDoc2.txt', 20, '1i', 16, '08/09/2022', 'Jonathan')
because it is grouping myDoc.txt & myDoc2.txt, then finding the largest version number (15 and 20) for each group.
Using normal SQL statements, I should be able to call:
"SELECT *, MAX(version_num) FROM testfiles GROUP BY filekey"
and it will do what I am wanting when tested on a sample SQL DB https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_min
The only thing I can think of is that the above SELECT statement will not work because I am running a Postgresql DB rather than a standard MySQL DB.
Lastly, this is the error I receive when trying to run the SELECT statement on our Postgresql DB:
"SELECT *, MAX(version_num) FROM testfiles GROUP BY filekey"
"error": "column \"testfiles.dockey\" must appear in the GROUP BY clause or be
used in an aggregate function
\nLINE 1: SELECT MAX(version_num), * FROM testfiles GROUP BY filekey\n ^\n"
Thank you so much to anyone who can help!
CodePudding user response:
You can try using CTE. Below code might be useful for you..
with cte1 as (
select *,
row_number() over (partition by filekey order by dockey) asrn
from testfiles
)
select *
from cte1
where rn in (select max(rn) from cte1)
order by dockey
CodePudding user response:
WITH B AS
(SELECT
filekey,
MAX(version_num) max_version FROM testfiles
GROUP BY
filekey)
SELECT
A.* FROM testfiles A INNER JOIN B ON A.filekey=B.filekey AND A.version_num=B.max_version_num;