I got a table like this one
CaseID | NAME | ADDRESS | ZIP | ROLE |
---|---|---|---|---|
1 | Joe | address_1 | zip_1 | role_1 |
1 | John | address_2 | zip_2 | role_1 |
1 | Jane | address_3 | zip_3 | role_1 |
1 | Bill | address_4 | zip_4 | role_1 |
1 | Bill | address_5 | zip_5 | role_2 |
2 | Bob | address_6 | zip_6 | role_1 |
2 | Shawn | address_7 | zip_7 | role_1 |
I would like to group by the name and CaseID, making a list of the roles in each group. That part is easy. The tricky part is that as you can see for Bill, we have two different addresses and zip. I tried to keep only one with a Max or Min aggregation function inside the group, but there might be inconsistency in the resulting address, keeping zip of one row and the address of the other raw. How can I fetch the zip and address of the same row (which ever) in a group and listing all the roles. I'd like a result like
CaseID | NAME | ADDRESS | ZIP | ROLE |
---|---|---|---|---|
1 | Joe | address_1 | zip_1 | role_1 |
1 | John | address_2 | zip_2 | role_1 |
1 | Jane | address_3 | zip_3 | role_1 |
1 | Bill | address_4 | zip_4 | role_1, role_2 |
2 | Bob | address_6 | zip_6 | role_1 |
2 | Shawn | address_7 | zip_7 | role_1 |
or
CaseID | NAME | ADDRESS | ZIP | ROLE |
---|---|---|---|---|
1 | Joe | address_1 | zip_1 | role_1 |
1 | John | address_2 | zip_2 | role_1 |
1 | Jane | address_3 | zip_3 | role_1 |
1 | Bill | address_5 | zip_5 | role_1, role_2 |
2 | Bob | address_6 | zip_6 | role_1 |
2 | Shawn | address_7 | zip_7 | role_1 |
CodePudding user response:
In Oracle that's a good use case for the keep
syntax to aggregate functions:
select caseid, name,
min(address) keep(dense_rank first order by address) address,
min(zip) keep(dense_rank first order by address) zip,
listagg(role, ', ') within group (order by role) role
from mytable
group by caseid, name
The order by
clause of keep
lets you consistently "pick" a row in the group - that whose address
comes first when sorted against others in the group ; we can repeat the expression across columns to get the zip
of the same row.
Note that the query would benefit a more stable ordering criteria (such as order by address_id
maybe?). Else, from which row should the zip be kept when there are duplicate addresses?
CodePudding user response:
Here's one option; read comments within code.
Sample data:
SQL> with test (caseid, name, address, zip, role) as
2 (select 1, 'Joe' , 'address_1', 'zip_1', 'role_1' from dual union all
3 select 1, 'John', 'address_2', 'zip_2', 'role_1' from dual union all
4 select 1, 'Bill', 'address_4', 'zip_4', 'role_1' from dual union all
5 select 1, 'Bill', 'address_5', 'zip_5', 'role_2' from dual union all
6 select 2, 'Bob' , 'address_6', 'zip_6', 'role_1' from dual
7 ),
Query begins here:
8 temp as
9 -- RN will later be used in correlated subqueries to fetch
10 -- ADDRESS and ZIP values that belong to the same row
11 (select caseid, name, address, zip, role,
12 row_number() over (partition by caseid, name order by address) rn
13 from test
14 )
15 -- finally, fetch what you can from the TEMP CTE; use correlated subqueries
16 -- for the rest and aggregate ROLE values
17 select a.caseid,
18 a.name,
19 --
20 (select b.address
21 from temp b
22 where b.caseid = a.caseid
23 and b.name = a.name
24 and b.rn = 1
25 ) address,
26 --
27 (select b.zip
28 from temp b
29 where b.caseid = a.caseid
30 and b.name = a.name
31 and b.rn = 1
32 ) zip,
33 --
34 listagg(role, ', ') within group (order by role) role
35 from temp a
36 group by a.caseid, a.name
37 /
Result:
CASEID NAME ADDRESS ZIP ROLE
---------- ---- --------- ----- ---------------
1 Joe address_1 zip_1 role_1
1 Bill address_4 zip_4 role_1, role_2
1 John address_2 zip_2 role_1
2 Bob address_6 zip_6 role_1
SQL>
CodePudding user response:
S.t. like
select
caseid, name,
min(address) keep (dense_rank first order by address, zip),
min(zip) keep (dense_rank first order by address, zip),
listagg(role)
from <crappy table>
group by caseid, name