Home > Mobile >  Is there a way to compare the similarity between sentences in sql?
Is there a way to compare the similarity between sentences in sql?

Time:11-07

Is there a way to compare the similarity between sentences in sql? I have large dataset and I need to identify instances where there are similar words in a two or more setences.

enter image description here

How do I tell SQL to only return the values below?

enter image description here

From what I have googled, there may be a way to do this using a Full-Text Search and Semantic Search, but I have been able to find an article that addresses what I am trying to achieve.

Could someone in the group, provide me example or point to an article that could help me? Better yet, is what I am trying to do even achievable in SQL.

CodePudding user response:

No, there is not.

Part of the problem is that "similarity" is a complex setup and this requires a program to analyze the sentence POSSIBLY with months of programming. You give pretty simplistic examples - grats. Even that is not as easy as you think. What about "the small boy wear red t-shirt" - would small boy be a difference or not?

This requires a LOT of work, and a LOT of definition, or a LOT of training of possibly a multi layer neural network.

SQL generally is awful at string manipulation - the best you get is SOUNDEX and that just compares 4 letters of the first word (RTFM, it is actually QUITE interesting how it works, but it makes it absolutely unsuitable for anything like comparing sentences.

So, no - this is simply way outside the scope of anything in SQL, you will have to download the data and use an out of SQL approach (which is also a LOT more fit for this type of work).

You can obviously work around that with simplistic SQL such as @ASH was suggesting - but this is not looking for "similar sentences" but working around specific markers that ARE SPECIFIC FOR YOUR DATA SET. THis is overfitting and bypassing answering the question you have asked.

CodePudding user response:

You can try SOUNDEX function. Google SOUNDEX and then understand if this works for your case. The query is:

SELECT *
FROM your_table
WHERE SOUNDEX(Sentence) = SOUNDEX(Sentence);
  • Related