Home > other >  Is there a way to find out if two strings are similar in SQL Server without knowing anything about t
Is there a way to find out if two strings are similar in SQL Server without knowing anything about t

Time:01-05

I'm trying to put together something that pulls related items based off a primary item.

For example, say I've got a really simple [FRUIT] table:

ID NAME
1 Fuji Apples
2 Apple: Golden Delicious
3 Granny Smith Apple
4 Blood Orange
5 Orange: Mandarin

And the user is currently looking at "Fuji Apples". I want to return the rows for "Apple: Golden Delicious" and "Granny Smith Apple" because they also have the word "Apple" in the value of their [Name] column. I guess what I'm looking for is something like LIKE, that does a broader comparison of the strings to see if there's any similar sets of characters.


I've taken a look at SOUNDEX and DIFFERENCE, but they're not what I'm looking for as my strings are too long and the similar word could be anywhere in the string.

If there's nothing that's fine, I can always implement some similarity algorithm if needed; but I don't want to put in the effort if there's already built-in to t-sql.

Note: I am aware in the example above it would make more sense to just add another column and/or table that had the values "Apple" and "Orange"; but that's not what I'm asking about.

CodePudding user response:

Please try the following solution.

It is using XML, XQuery, and Quantified Expressions.

Useful link: Quantified Expressions (XQuery)

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE  (ID INT IDENTITY(1,1) PRIMARY KEY, fruit VARCHAR(200));
INSERT INTO @tbl (fruit) VALUES
('Fuji Apples'),
('Apple: Golden Delicious'),
('Granny Smith Apple'),
('Blood Orange'),
('Orange: Mandarin');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1)
    , @searchFor VARCHAR(30) = 'Fuji Apple';

SELECT t.*
    , c.value('some $r in /root/source/r/text()
                satisfies contains(data(/root/target)[1], $r)', 'BIT') AS Result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><source><r><![CDATA['   REPLACE(@searchFor, @Separator, ']]></r><r><![CDATA[')   ']]></r></source>'
        '<target><r><![CDATA['   REPLACE(fruit, @Separator, ']]></r><r><![CDATA[')   ']]></r></target></root>' AS XML)
      ) AS t1(c);

Output

 ---- ------------------------- -------- 
| ID |          fruit          | Result |
 ---- ------------------------- -------- 
|  1 | Fuji Apples             |      1 |
|  2 | Apple: Golden Delicious |      1 |
|  3 | Granny Smith Apple      |      1 |
|  4 | Blood Orange            |      0 |
|  5 | Orange: Mandarin        |      0 |
 ---- ------------------------- -------- 
  •  Tags:  
  • Related