Home > Enterprise >  Converting Similar names to canonical form in SQL
Converting Similar names to canonical form in SQL

Time:02-22

I have a database of books. There can be situations where an author can use variations of their names as author.

For eg. Albert Einstein can be written as Albert E. or A. Einstein

Now I need to create a method such that SQL database and tell me the ID of the author based on such situations.

Is there a built in way to convert these multiple variations of a name into a single type within SQL language such as PostgreSQL?

Or Should I change the name using some another language such as C, C or Python to convert the name into a particular form and then send to database?

I would rather have a method within SQL instead of using some another language

CodePudding user response:

You could use PostgreSQL full text search with a thesaurus dictionary. The configuration file would then contain

Albert E. : Albert Einstein
A. Einstein : Albert Einstein

CodePudding user response:

Wikipedia says:

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.[1] The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms (in part because it is a standard feature of popular database software such as DB2, PostgreSQL,[2] MySQL,[3] SQLite,[4] Ingres, MS SQL Server,[5] Oracle.[6] and SAP ASE.[7]) Improvements to Soundex are the basis for many modern phonetic algorithms

https://en.wikipedia.org/wiki/Soundex

  • Related