How to select column values of a group based on another column value-CodePudding

I got a table like this one

CaseID	NAME	ADDRESS	ZIP	ROLE
1	Joe	address_1	zip_1	role_1
1	John	address_2	zip_2	role_1
1	Jane	address_3	zip_3	role_1
1	Bill	address_4	zip_4	role_1
1	Bill	address_5	zip_5	role_2
2	Bob	address_6	zip_6	role_1
2	Shawn	address_7	zip_7	role_1

I would like to group by the name and CaseID, making a list of the roles in each group. That part is easy. The tricky part is that as you can see for Bill, we have two different addresses and zip. I tried to keep only one with a Max or Min aggregation function inside the group, but there might be inconsistency in the resulting address, keeping zip of one row and the address of the other raw. How can I fetch the zip and address of the same row (which ever) in a group and listing all the roles. I'd like a result like

CaseID	NAME	ADDRESS	ZIP	ROLE
1	Joe	address_1	zip_1	role_1
1	John	address_2	zip_2	role_1
1	Jane	address_3	zip_3	role_1
1	Bill	address_4	zip_4	role_1, role_2
2	Bob	address_6	zip_6	role_1
2	Shawn	address_7	zip_7	role_1

CaseID	NAME	ADDRESS	ZIP	ROLE
1	Joe	address_1	zip_1	role_1
1	John	address_2	zip_2	role_1
1	Jane	address_3	zip_3	role_1
1	Bill	address_5	zip_5	role_1, role_2
2	Bob	address_6	zip_6	role_1
2	Shawn	address_7	zip_7	role_1

CodePudding user response：

In Oracle that's a good use case for the keep syntax to aggregate functions:

select caseid, name, 
    min(address) keep(dense_rank first order by address) address,
    min(zip)     keep(dense_rank first order by address) zip,
    listagg(role, ', ') within group (order by role) role
from mytable
group by caseid, name

The order by clause of keep lets you consistently "pick" a row in the group - that whose address comes first when sorted against others in the group ; we can repeat the expression across columns to get the zip of the same row.

Note that the query would benefit a more stable ordering criteria (such as order by address_id maybe?). Else, from which row should the zip be kept when there are duplicate addresses?

Demo on DB Fiddle

CodePudding user response：

Here's one option; read comments within code.

Sample data:

SQL> with test (caseid, name, address, zip, role) as
  2    (select 1, 'Joe' , 'address_1', 'zip_1', 'role_1' from dual union all
  3     select 1, 'John', 'address_2', 'zip_2', 'role_1' from dual union all
  4     select 1, 'Bill', 'address_4', 'zip_4', 'role_1' from dual union all
  5     select 1, 'Bill', 'address_5', 'zip_5', 'role_2' from dual union all
  6     select 2, 'Bob' , 'address_6', 'zip_6', 'role_1' from dual
  7    ),

Query begins here:

  8  temp as
  9    -- RN will later be used in correlated subqueries to fetch
 10    -- ADDRESS and ZIP values that belong to the same row
 11    (select caseid, name, address, zip, role,
 12       row_number() over (partition by caseid, name order by address) rn
 13     from test
 14    )
 15  -- finally, fetch what you can from the TEMP CTE; use correlated subqueries
 16  -- for the rest and aggregate ROLE values
 17  select a.caseid,
 18    a.name,
 19    --
 20    (select b.address
 21     from temp b
 22     where b.caseid = a.caseid
 23       and b.name = a.name
 24       and b.rn = 1
 25    ) address,
 26    --
 27    (select b.zip
 28     from temp b
 29     where b.caseid = a.caseid
 30       and b.name = a.name
 31       and b.rn = 1
 32    ) zip,
 33    --
 34    listagg(role, ', ') within group (order by role) role
 35  from temp a
 36  group by a.caseid, a.name
 37  /

Result:

    CASEID NAME ADDRESS   ZIP   ROLE
---------- ---- --------- ----- ---------------
         1 Joe  address_1 zip_1 role_1
         1 Bill address_4 zip_4 role_1, role_2
         1 John address_2 zip_2 role_1
         2 Bob  address_6 zip_6 role_1

SQL>

CodePudding user response：

S.t. like

select 
   caseid, name, 
   min(address) keep (dense_rank first order by address, zip), 
   min(zip) keep (dense_rank first order by address, zip), 
   listagg(role)
from <crappy table> 
group by caseid, name