Home > Mobile >  SQL exclude rows containing nearly duplicate values filtered by grouped max values
SQL exclude rows containing nearly duplicate values filtered by grouped max values

Time:06-28

I have a table like this...

ID User_ID Skill Skill_Level Skill_ID
1 1 Project Manager 3 1
2 4 Teacher 6 2
3 2 Teacher 5 2
4 3 Administrator 2 3
5 3 Project Manager 3 1
6 1 Project Manager 4 1
7 4 Deputy Head 5 4
8 2 Teacher 6 2

For regulatory reasons we cannot delete old (historic) rows. I need to run a query that just returns the set of "User_ID" "Skill" and "Best Skill-Level" (i.e. max value per person). The successful query should look like this... So, some people can have more than one skill (User_ID 3 and 4 in this example) and some people become more skilled in their skill (User_ID 1 and 2).

ID User_ID Skill Skill_Level Skill_ID
2 4 Teacher 6 2
4 3 Administrator 2 3
5 3 Project Manager 3 1
6 1 Project Manager 4 1
7 4 Deputy Head 5 4
8 2 Teacher 6 2

Have you a suggestion about the most efficient and shortest query code that can achieve this? There is a table [dbo].[skills] with PK skill_ID_PK a table [dbo].[people] with PK user_ID_PK and a table [dbo].[skillHistory] with FK skill_ID_PK and FK user_ID_PK.

Any help most welcomed.

CodePudding user response:

Using a SQL Generator for Snowflake I found that using QUALIFY was the recommendation, but I don't think that works in MSSQL.

To do this without QUALIFY, you might try something like this:

MSSQL

SELECT *
FROM (
    SELECT *,
    ROW_NUMBER() OVER (PARTITION BY User_ID, Skill ORDER BY Skill_Level DESC) AS RN
    FROM table
    ) AS T
WHERE T.RN = 1;

For RDBMS that support QUALIFY,

SELECT 
  * 
FROM 
  table QUALIFY ROW_NUMBER() OVER (
    PARTITION BY User_ID, Skill 
    ORDER BY Skill_Level DESC
  ) = 1

  • Related