I have two columns and I would like to make a comparison to find out if they are equal, the problem is when same keyword is written in different way.
For example if column_1 = 123 Maryland Ave and column_2 = 123 Maryland Avenue these two column should be equal and I would like to create a third column in the query to show if they are equal or not, thank you for your help!
Column_1 Column_2 Equal?
-----------------------------------------------------------
123 Maryland Ave 123 Maryland Avenue Yes
456 Maryland Ave 123 Maryland Ave No
CodePudding user response:
One option is to check similarity between those values:
SQL> with test (id, col1, col2) as
2 (select 1, '123 Maryland Ave', '123 Maryland Avenue' from dual union all
3 select 2, '456 Maryland Ave', '123 Maryland Ave' from dual
4 )
5 select id, col1, col2,
6 utl_match.jaro_winkler_similarity(col1, col2) as sim
7 from test;
ID COL1 COL2 SIM
---------- ---------------- ------------------- ----------
1 123 Maryland Ave 123 Maryland Avenue 96
2 456 Maryland Ave 123 Maryland Ave 87
SQL>
Now, you have to decide the threshold which satisfies your needs. Is it 90%? Let's suppose it is. Then you'd use CASE
expression:
SQL> with test (id, col1, col2) as
2 (select 1, '123 Maryland Ave', '123 Maryland Avenue' from dual union all
3 select 2, '456 Maryland Ave', '123 Maryland Ave' from dual
4 )
5 select id, col1, col2,
6 case when utl_match.jaro_winkler_similarity(col1, col2) > 90 then 'Yes'
7 else 'No'
8 end as equal
9 from test;
ID COL1 COL2 EQUAL
---------- ---------------- ------------------- -------
1 123 Maryland Ave 123 Maryland Avenue Yes
2 456 Maryland Ave 123 Maryland Ave No
SQL>