Ruby Anagrams puzzle-CodePudding

i a m a begginner in ruby please i need directions in how to get the program to return a list containing "inlets"

Question: Given a word and a list of possible anagrams, select the correct sublist.

Given "listen" and a list of candidates like "enlists" "google" "inlets" "banana" the program should return a list containing "inlets".

This is what i have been able to do

puts 'Enter word'
word_input = gets.chomp
puts 'Enter anagram_list'
potential_anagrams = gets.chomp
potential_anagrams.each do |anagram|
end

this is how the program should behave assuming my word_input was "hello" but i do not know how to get this working.

is_anagram? word: 'hello', anagrams: ['helo', 'elloh', 'heelo', 'llohe']
# => 'correct anagrams are: elloh, llohe'

Would really appreciate ideas.

CodePudding user response：

As hinted in comments, a string can be easily converted to an array of its characters.

irb(main):005:0> "hello".chars
=> ["h", "e", "l", "l", "o"]
irb(main):006:0> "lleho".chars
=> ["l", "l", "e", "h", "o"]

Arrays can be easily sorted.

irb(main):007:0> ["h", "e", "l", "l", "o"].sort
=> ["e", "h", "l", "l", "o"]
irb(main):008:0> ["l", "l", "e", "h", "o"].sort
=> ["e", "h", "l", "l", "o"]

And arrays can be compared.

irb(main):009:0> ["e", "h", "l", "l", "o"] == ["e", "h", "l",
l", "o"]
=> true

Put all of this together and you should be able to determine if one word is an anagram of another. You can then pair this with #select to find the anagrams in an array. Something like:

def is_anagram?(word, words)
  words.select do |w|
    ...
  end
end

CodePudding user response：

I took the tack of creating a class in pursuit of re-usability. This is overkill for a one-off usage, but allows you to build up a set of known words and then poll it as many times as you like for anagrams of multiple candidate words.

This solution is built on hashes and sets, using the sorted characters of a word as the index to a set of words sharing the same letters. Hashing is O(1), Sets are O(1), and if we view words as having a bounded length the calculation of the key is also O(1), yielding an overall complexity of a constant time per word.

I've commented the code, but if anything is unclear feel free to ask.

require 'set'

class AnagramChecker
  def initialize
    # A hash whose default value is an empty set object
    @known_words = Hash.new { |h, k| h[k] = Set.new }
  end

  # Create a key value from a string by breaking it into individual
  # characters, sorting, and rejoining, so all strings which are
  # anagrams of each other will have the same key.
  def key(word)
    word.chars.sort.join
  end

  # Add individual words to the class by generating their key value
  # and adding the word to the set. Using a set guarantees no
  # duplicates of the words, since set contents are unique.
  def add_word(word)
    word_key = key(word)
    @known_words[word_key] << word
    # return the word's key to avoid duplicate work in find_anagrams
    word_key
  end

  def find_anagrams(word)
    word_key = add_word(word)     # add the target word to the known_words
    @known_words[word_key].to_a   # return all anagramatic words as an array
  end

  def inspect
    p @known_words
  end
end

Producing a library of known words looks like this:

ac = AnagramChecker.new
["enlists", "google", "inlets", "banana"].each { |word| ac.add_word word }

ac.inspect   # {"eilnsst"=>#<Set: {"enlists"}>, "eggloo"=>#<Set: {"google"}>, "eilnst"=>#<Set: {"inlets"}>, "aaabnn"=>#<Set: {"banana"}>}

Using it looks like:

p ac.find_anagrams("listen")  # ["inlets", "listen"]
p ac.find_anagrams("google")  # ["google"]

If you don't want the target word to be included in the output, adjust find_anagrams accordingly.

CodePudding user response：

Here's how I would do it.

Method

def anagrams(list, word)
  ltr_freq = word.each_char.tally
  list.select { |w| w.size == word.size && w.each_char.tally == ltr_freq }
end

Example

list = ['helo', 'elloh', 'heelo', 'llohe']
anagrams(list, 'hello')
  #=> ["elloh", "llohe"]

Computational complexity

For practical purposes, the computational complexity of computing w.each_char.tally for any word w of length n can be regarded as O(n). That's because hash key lookups are nearly constant-time. It follows that the computational complexity of determining whether a word of length n is an anagram of another word of the same length can be regarded as O(n).

Explanation

See Enumerable#tally. Note that w.each_char.tally is not computed when w.size == word.size is false.

Now let's add some puts statements to see what is happening.

def anagrams(list, word)
  ltr_freq = word.each_char.tally
  puts "ltr_freq = #{ltr_freq}"
  list.select do |w|
    puts "\nw = '#{w}'"
    if w.size != word.size
      puts "words differ in length"
      false
    else
      puts "w.each_char.tally = #{w.each_char.tally}"
      if w.each_char.tally == ltr_freq
        puts "character frequencies are the same"
        true
      else
        puts "character frequencies differ"
        false
      end
    end
  end
end

anagrams(list, 'hello')
ltr_freq = {"h"=>1, "e"=>1, "l"=>2, "o"=>1}

w = 'helo'
words differ in length

w = 'elloh'
w.each_char.tally = {"e"=>1, "l"=>2, "o"=>1, "h"=>1}
character frequencies are the same

w = 'heelo'
w.each_char.tally = {"h"=>1, "e"=>2, "l"=>1, "o"=>1}
character frequencies differ

w = 'llohe'
w.each_char.tally = {"l"=>2, "o"=>1, "h"=>1, "e"=>1}
character frequencies are the same
  #=>["elloh", "llohe"]

Possible improvement

A potential weakness of the expression

w.each_char.tally == ltr_freq

is that the frequencies of all unique letters in the word w must be determined before a conclusion is reached, even if, for example, the first letter of w does not appear in the word word. We can remedy that as follows.

def anagrams(list, word)
  ltr_freq = word.each_char.tally
  list.select do |w|
    next false unless w.size == word.size
    ltr_freqs_match?(w, ltr_freq.dup)
  end
end

def ltr_freqs_match?(w, ltr_freq)
  w.each_char do |c|
    return false unless ltr_freq.key?(c)
    ltr_freq[c] -= 1
    ltr_freq.delete(c) if ltr_freq[c].zero?
  end
  true
end

One would have to test this variant against the original version of anagrams above to determine which tends to be fastest. This variant has the advantage that it terminates (short-circuits) the comparison as soon as is found that the cumulative count of a given character in the word w is greater than the total count of the same letter in word. At the same time, tally is written in C so it may still be faster.