i a m a begginner in ruby please i need directions in how to get the program to return a list containing "inlets"
Question: Given a word and a list of possible anagrams, select the correct sublist.
Given "listen" and a list of candidates like "enlists" "google" "inlets" "banana" the program should return a list containing "inlets".
This is what i have been able to do
puts 'Enter word'
word_input = gets.chomp
puts 'Enter anagram_list'
potential_anagrams = gets.chomp
potential_anagrams.each do |anagram|
end
this is how the program should behave assuming my word_input was "hello"
but i do not know how to get this working.
is_anagram? word: 'hello', anagrams: ['helo', 'elloh', 'heelo', 'llohe']
# => 'correct anagrams are: elloh, llohe'
Would really appreciate ideas.
CodePudding user response:
As hinted in comments, a string can be easily converted to an array of its characters.
irb(main):005:0> "hello".chars
=> ["h", "e", "l", "l", "o"]
irb(main):006:0> "lleho".chars
=> ["l", "l", "e", "h", "o"]
Arrays can be easily sorted.
irb(main):007:0> ["h", "e", "l", "l", "o"].sort
=> ["e", "h", "l", "l", "o"]
irb(main):008:0> ["l", "l", "e", "h", "o"].sort
=> ["e", "h", "l", "l", "o"]
And arrays can be compared.
irb(main):009:0> ["e", "h", "l", "l", "o"] == ["e", "h", "l",
l", "o"]
=> true
Put all of this together and you should be able to determine if one word is an anagram of another. You can then pair this with #select
to find the anagrams in an array. Something like:
def is_anagram?(word, words)
words.select do |w|
...
end
end
CodePudding user response:
I took the tack of creating a class in pursuit of re-usability. This is overkill for a one-off usage, but allows you to build up a set of known words and then poll it as many times as you like for anagrams of multiple candidate words.
This solution is built on hashes and sets, using the sorted characters of a word as the index to a set of words sharing the same letters. Hashing is O(1), Sets are O(1), and if we view words as having a bounded length the calculation of the key is also O(1), yielding an overall complexity of a constant time per word.
I've commented the code, but if anything is unclear feel free to ask.
require 'set'
class AnagramChecker
def initialize
# A hash whose default value is an empty set object
@known_words = Hash.new { |h, k| h[k] = Set.new }
end
# Create a key value from a string by breaking it into individual
# characters, sorting, and rejoining, so all strings which are
# anagrams of each other will have the same key.
def key(word)
word.chars.sort.join
end
# Add individual words to the class by generating their key value
# and adding the word to the set. Using a set guarantees no
# duplicates of the words, since set contents are unique.
def add_word(word)
word_key = key(word)
@known_words[word_key] << word
# return the word's key to avoid duplicate work in find_anagrams
word_key
end
def find_anagrams(word)
word_key = add_word(word) # add the target word to the known_words
@known_words[word_key].to_a # return all anagramatic words as an array
end
def inspect
p @known_words
end
end
Producing a library of known words looks like this:
ac = AnagramChecker.new
["enlists", "google", "inlets", "banana"].each { |word| ac.add_word word }
ac.inspect # {"eilnsst"=>#<Set: {"enlists"}>, "eggloo"=>#<Set: {"google"}>, "eilnst"=>#<Set: {"inlets"}>, "aaabnn"=>#<Set: {"banana"}>}
Using it looks like:
p ac.find_anagrams("listen") # ["inlets", "listen"]
p ac.find_anagrams("google") # ["google"]
If you don't want the target word to be included in the output, adjust find_anagrams
accordingly.
CodePudding user response:
Here's how I would do it.
Method
def anagrams(list, word)
ltr_freq = word.each_char.tally
list.select { |w| w.size == word.size && w.each_char.tally == ltr_freq }
end
Example
list = ['helo', 'elloh', 'heelo', 'llohe']
anagrams(list, 'hello')
#=> ["elloh", "llohe"]
Computational complexity
For practical purposes, the computational complexity of computing w.each_char.tally
for any word w
of length n
can be regarded as O(n
). That's because hash key lookups are nearly constant-time. It follows that the computational complexity of determining whether a word of length n
is an anagram of another word of the same length can be regarded as O(n
).
Explanation
See Enumerable#tally. Note that w.each_char.tally
is not computed when w.size == word.size
is false
.
Now let's add some puts
statements to see what is happening.
def anagrams(list, word)
ltr_freq = word.each_char.tally
puts "ltr_freq = #{ltr_freq}"
list.select do |w|
puts "\nw = '#{w}'"
if w.size != word.size
puts "words differ in length"
false
else
puts "w.each_char.tally = #{w.each_char.tally}"
if w.each_char.tally == ltr_freq
puts "character frequencies are the same"
true
else
puts "character frequencies differ"
false
end
end
end
end
anagrams(list, 'hello')
ltr_freq = {"h"=>1, "e"=>1, "l"=>2, "o"=>1}
w = 'helo'
words differ in length
w = 'elloh'
w.each_char.tally = {"e"=>1, "l"=>2, "o"=>1, "h"=>1}
character frequencies are the same
w = 'heelo'
w.each_char.tally = {"h"=>1, "e"=>2, "l"=>1, "o"=>1}
character frequencies differ
w = 'llohe'
w.each_char.tally = {"l"=>2, "o"=>1, "h"=>1, "e"=>1}
character frequencies are the same
#=>["elloh", "llohe"]
Possible improvement
A potential weakness of the expression
w.each_char.tally == ltr_freq
is that the frequencies of all unique letters in the word w
must be determined before a conclusion is reached, even if, for example, the first letter of w
does not appear in the word word
. We can remedy that as follows.
def anagrams(list, word)
ltr_freq = word.each_char.tally
list.select do |w|
next false unless w.size == word.size
ltr_freqs_match?(w, ltr_freq.dup)
end
end
def ltr_freqs_match?(w, ltr_freq)
w.each_char do |c|
return false unless ltr_freq.key?(c)
ltr_freq[c] -= 1
ltr_freq.delete(c) if ltr_freq[c].zero?
end
true
end
One would have to test this variant against the original version of anagrams
above to determine which tends to be fastest. This variant has the advantage that it terminates (short-circuits) the comparison as soon as is found that the cumulative count of a given character in the word w
is greater than the total count of the same letter in word
. At the same time, tally
is written in C so it may still be faster.