How to use GROUP BY on two columns even if their values are swapped-CodePudding

I recently ran into a problem where I have to group by 2 columns with similar values. The columns can swap values anytime but I still need to group them.

For example, consider there is a duo that completes certain tasks

COL 1   |  COL 2   |  COL 3
User 1  |  User 2  |  Task 1
User 1  |  User 2  |  Task 2
User 1  |  User 2  |  Task 3
User 2  |  User 1  |  Task 4

in the above example, User 1 and User 2 have worked together 4 times and have completed 4 tasks. I am using GROUP BY on COL1 and COL2 but it is grouping the first 3 results and is leaving out the fourth one because the columns have swapped values. How can I deal with this?

CodePudding user response：

You can do something like this. get all possible combinations and take one condition based on userIds. I haven't handled null but if that's a requirement you can use nvl or coalesce and add a dummy value to handle the cases

SELECT COL1, COL2, COUNT(TASK) FROM 
(
  SELECT COL1, COL2, TASK FROM users
  UNION
  SELECT COL2, COL1, TASK FROM users
)a WHERE COL1 < COL2 OR COL1 = COL2
GROUP BY COL1,COL2

Output:

User1 User2 4

Sql Fiddle

CodePudding user response：

You swap the two columns so that smaller value becomes col1 and the other value becomes col2, then group:

SELECT col1_new, col2_new, COUNT(*)
FROM T
CROSS APPLY (SELECT
    CASE WHEN col1 < col2 THEN col1 ELSE col2 END,
    CASE WHEN col1 < col2 THEN col2 ELSE col1 END
) AS x(col1_new, col2_new)
GROUP BY col1_new, col2_new

If CROSS APPLY or lateral joins are not available then you need to copy-paste the expressions inside the select and group by clause.

CodePudding user response：

Use least() and greatest() to make sure col1 user < col2 user. Then GROUP BY:

select least(COL1, COL2), greatest(COL1, COL2), count(*)
from tablename
group by least(COL1, COL2), greatest(COL1, COL2)