I recently ran into a problem where I have to group by 2 columns with similar values. The columns can swap values anytime but I still need to group them.
For example, consider there is a duo that completes certain tasks
COL 1 | COL 2 | COL 3
User 1 | User 2 | Task 1
User 1 | User 2 | Task 2
User 1 | User 2 | Task 3
User 2 | User 1 | Task 4
in the above example, User 1 and User 2 have worked together 4 times and have completed 4 tasks. I am using GROUP BY on COL1 and COL2 but it is grouping the first 3 results and is leaving out the fourth one because the columns have swapped values. How can I deal with this?
CodePudding user response:
You can do something like this. get all possible combinations and take one condition based on userIds. I haven't handled null but if that's a requirement you can use nvl
or coalesce
and add a dummy value to handle the cases
SELECT COL1, COL2, COUNT(TASK) FROM
(
SELECT COL1, COL2, TASK FROM users
UNION
SELECT COL2, COL1, TASK FROM users
)a WHERE COL1 < COL2 OR COL1 = COL2
GROUP BY COL1,COL2
Output:
User1 User2 4
CodePudding user response:
You swap the two columns so that smaller value becomes col1 and the other value becomes col2, then group:
SELECT col1_new, col2_new, COUNT(*)
FROM T
CROSS APPLY (SELECT
CASE WHEN col1 < col2 THEN col1 ELSE col2 END,
CASE WHEN col1 < col2 THEN col2 ELSE col1 END
) AS x(col1_new, col2_new)
GROUP BY col1_new, col2_new
If CROSS APPLY
or lateral joins are not available then you need to copy-paste the expressions inside the select and group by clause.
CodePudding user response:
Use least()
and greatest()
to make sure col1 user < col2 user. Then GROUP BY
:
select least(COL1, COL2), greatest(COL1, COL2), count(*)
from tablename
group by least(COL1, COL2), greatest(COL1, COL2)