First I want to remove duplicates by selecting the last date and then remove the remaining duplicates by selecting the role 'BLUE'
Example Table:
ID | STATUS | ROLE | DATE |
---|---|---|---|
1 | ACTIVE | BLUE | Oct 20 2022 |
1 | ACTIVE | RED | Dec 20 2022 |
2 | ACTIVE | BLUE | Feb 02 2022 |
2 | ACTIVE | RED | Feb 02 2022 |
3 | INACTIVE | BLUE | Dec 03 2022 |
4 | ACTIVE | RED | Dec 04 2022 |
Expected result:
ID | STATUS | ROLE | DATE |
---|---|---|---|
1 | ACTIVE | RED | Dec 20 2022 |
2 | ACTIVE | BLUE | Feb 02 2022 |
3 | INACTIVE | BLUE | Dec 03 2022 |
4 | ACTIVE | RED | Dec 04 2022 |
This is what I have so far:
SELECT a.ID,
a.STATUS,
a.ROLE,
a.DATE
FROM
(
SELECT ID, Max(DATE) as MaxDate
FROM WorkersTest
GROUP BY ID
) b
INNER JOIN WorkersTest as a
ON a.ID = b.ID
AND a.DATE = b.MaxDate
ORDER BY b."ID"
Then as you can see I still need to add the second filter/subquery...
CodePudding user response:
First we can use following subquery to get each id with its latest date:
SELECT id, MAX(date) AS maxDate
FROM yourtable
GROUP BY id;
This can basically be used in the whole query like this:
SELECT y.id, y.status, y.role,
FORMAT(y.date, 'MMM dd yyyy') AS date
FROM yourtable y
JOIN
(SELECT id, MAX(date) AS maxDate
FROM yourtable
GROUP BY id) grouped
ON y.id = grouped.id
AND y.date = grouped.maxDate
ORDER BY y.id;
But this will list both "blue" and "red" role in case they both have the same latest date.
Therefore, the result would be incorrect:
ID | STATUS | ROLE | DATE |
---|---|---|---|
1 | ACTIVE | RED | Dec 20 2022 |
2 | ACTIVE | BLUE | Feb 02 2022 |
2 | ACTIVE | RED | Feb 02 2022 |
3 | INACTIVE | BLUE | Dec 03 2022 |
4 | ACTIVE | RED | Dec 04 2022 |
So, to also satisfy the condition to only show the "blue" row in this case, there are different options. One of them would be to use a further subquery with a window function as for example ROW_NUMBER
.
This could become necessary if there are further roles.
In our specific case with two roles only, we don't need this, but can use MIN
instead because "blue" appears before "red" (if we would like to get "red" rows instead, we would use MAX
).
So the query is now this one:
SELECT y.id, y.status,
MIN(y.role) AS role,
FORMAT(y.date, 'MMM dd yyyy') AS date
FROM yourtable y
JOIN
(SELECT id, MAX(date) AS maxDate
FROM yourtable
GROUP BY id) grouped
ON y.id = grouped.id
AND y.date = grouped.maxDate
GROUP BY y.id, y.status, y.date
ORDER BY y.id;
This will produce the correct result:
ID | STATUS | ROLE | DATE |
---|---|---|---|
1 | ACTIVE | RED | Dec 20 2022 |
2 | ACTIVE | BLUE | Feb 02 2022 |
3 | INACTIVE | BLUE | Dec 03 2022 |
4 | ACTIVE | RED | Dec 04 2022 |
We can replicate this here: db<>fiddle
A general hint: If possible, we should avoid to use SQL key words as column name or table name (here "role" and "date").
Especially the name "date" is also not meaningful because it misses to tell us which kind of date. We should therefore prefer clear names like for example "sellDate" or "quittingDate".