Home > Enterprise >  Fuzzy `==`, or "almost_equal_to" function for strings?
Fuzzy `==`, or "almost_equal_to" function for strings?

Time:02-14

I want to search for duplicates in my database, but it could be things like

"The smallest thing, and nothing more" "The Smallest Things, And Nothing More" "The smallest thing, and nothing more." "The smallest thing, and nothing"

Is there an easy way to design a fuzzy == function that gives a weight of matching, instead of a binary true/false result?

CodePudding user response:

Ruby ships with a library called did_you_mean it is used to make suggestions for code correction when you make a mistake like "abc".downcsae will ask you "Did you mean downcase?"

This library includes a module called DidYouMean::Levenshtein which has a method called distance. This distance is the number of transformations required for 2 strings to be equal Example:

s = "The smallest thing, and nothing more" 
x = "The Smallest Things, And Nothing More"

DidYouMean::Levenshtein.distance(s,x)
#=> 6
DidYouMean::Levenshtein.distance(s.downcase,x.downcase)
#=> 1

This may be useful in your case although you would need to determine the threshold.

Implementation is also available via the Gem::Text module which you could include in a class if needed e.g.

class MyClass
  extend Gem::Text

  def self.fuzzy_equal(x:, y:, threshold:3)
    levenshtein_distance(x,y) <= threshold
  end
end

MyClass.fuzzy_equal?(x: s,y: x)
#=> false
MyClass.fuzzy_equal?(x: s.downcase,y: x.downcase)
#=> true
MyClass.fuzzy_equal?(x: s,y: x, threshold: 10)
#=> true
  • Related