I have table like this:
id | full_name |
---|---|
1 | John Smith |
2 | Smith John |
3 | Jim Jonson |
4 | JimJonson |
I want to get something like this:
id | full_name |
---|---|
1 | John Smith |
3 | Jim Jonson |
So, I need SELECT DISTINCT full_name FROM table
, so that
John Smith
andSmith John
to be one and the same, alsoJim Jonson
andJimJonson
I hope I explained it well. Can you help me?
CodePudding user response:
Step-by-step. Read comments within code.
SQL> with test (id, full_name) as
2 -- sample data
3 (select 1, 'John Smith' from dual union all
4 select 2, 'Smith John' from dual union all
5 select 3, 'Jim Jonson' from dual union all
6 select 4, 'JimJonson' from dual
7 ),
8 temp as
9 -- split full name to rows
10 (select id,
11 regexp_substr(full_name, '[^ ] ', 1, column_value) val,
12 column_value cv
13 from test cross join
14 table(cast(multiset(select level from dual
15 connect by level <= regexp_count(full_name, ' ') 1
16 ) as sys.odcinumberlist))
17 ),
18 temp2 as
19 -- aggregate full name with no space between "words"
20 (select id,
21 listagg(val, '') within group (order by val) full_name
22 from temp
23 group by id
24 ),
25 temp3 as
26 -- fetch only distinct values
27 (select min(b.id) id,
28 b.full_name
29 from temp2 b
30 group by b.full_name
31 )
32 -- finally, join TEMP3 and sample data
33 select b.id,
34 a.full_name
35 from test a join temp3 b on b.id = a.id
36 order by a.id;
ID FULL_NAME
---------- ----------------------------------------
1 John Smith
3 Jim Jonson
SQL>
CodePudding user response:
You can split the full_name values by initial capitals of name and surnames in unpivoted manner, and sort alphabetically, and combine by using LISTAGG()
function, and apply MIN()
aggregation at the last step such as
WITH t(id,full_name) AS
(
SELECT 1, 'John Smith' FROM dual UNION ALL
SELECT 2, 'Smith John' FROM dual UNION ALL
SELECT 3, 'Jim Jonson' FROM dual UNION ALL
SELECT 4, 'JimJonson' FROM dual
), t2 AS
(
SELECT id,
TRIM(SUBSTR(full_name,column_value,LEAD(column_value,1,LENGTH(full_name)) OVER (PARTITION BY id ORDER BY id,column_value)-1)) AS names
FROM t,
TABLE(CAST(MULTISET(SELECT REGEXP_INSTR(full_name,'[A-Z] ',1,level)
FROM dual
CONNECT BY level <= REGEXP_COUNT(full_name,'[A-Z]')) AS sys.odcivarchar2list ))
), t3 AS
(
SELECT id, LISTAGG(names,' ') WITHIN GROUP (ORDER BY id,names) AS full_name
FROM t2
GROUP BY id
)
SELECT MIN(id) AS min_id, full_name
FROM t3
GROUP BY full_name
ORDER BY min_id