Codewars - upper and lowercase letters are considered the same character

So i'm on this Kata :

def  first_non_repeating_letter(s) 
  a = s.chars
  a.select! { |char| a.count(char) == 1 }
  if a.empty?
    ("")
  else
    a.first
  end
end

And the only thing i'm missing is :

"As an added challenge, upper- and lowercase letters are considered the same character, but the function should return the correct case for the initial letter. For example, the input 'sTreSS' should return 'T'."

s.downcase.chars doesn't apply here then. I tried with .casecmp but remain unsuccessful. Should i use regex ?

CodePudding user response：

If the given string is s the computational complexity clearly will be at least O(s.size) (since we need to examine the entire string to confirm that a given character appears exactly once in the string). We may therefore look for a method with the same computational complexity, preferably one that employs relatively efficient Ruby built-in methods and is easy to understand and test.

Suppose the given string is as follows:

s = "TgHEtGgh"

The first character that appears only once, assuming case is not considered, is seen to be "E".

As a first step we may wish to compute the frequency of each character in the string, with lowercase and uppercase characters treated alike¹:

sdn = s.downcase
  #=> "tghetggh"
enum = sdn.each_char 
  #=> #<Enumerator: "tghetggh":each_char>
h = enum.tally
  #=> {"t"=>2, "g"=>3, "h"=>2, "e"=>1}

This uses the methods String#downcase, String#each_char and Enumerable#tally. Each of these methods has a computational complexity of O(s.size), so the calculation of h has the same complexity. As a bonus, each of these methods is implemented in C.

We may, of course, chain these methods:

h = s.downcase.each_char.tally
  #=> {"t"=>2, "g"=>3, "h"=>2, "e"=>1}

We may now simply step through the characters of the string s until a character c is found for which h[c.downcase] == 1 is true.

s.each_char.find { |c| h[c.downcase] == 1 }
  #=> "E"

See Enumerable#find.

For this last step to have a computational complexity of O(s.size) the computational complexity of the calculation h[c.downcase] would have to equal 1. In fact, the computational complexity of hash key lookups is slightly greater than 1, but from a practical standpoint we may assume it equals 1.

^{1. Note that we could have obtained the same result by having written arr = sdn.chars #=> ["t", "g", "h", "e", "t", "g", "g", "h"], then h = arr.tally. This has the disadvantage that, unlike String#each_char, String#chars creates a temporary array, consuming memory, though in this case the memory savings by using each_char may be minimal.}

CodePudding user response：

It is kinda tricky with your implementation because now you don't have any explicit comparison operation. Instead, you use a trick with Array#count.

The thing is that this implementation not only inflexible, it is also very inefficient. Its O(n^2) (because for each element traversed by select you call count on the whole array), so for inputs large enough your implementation will be extremely slow.

Good thing is that addressing the performance issue you will be able to easily implement this added challenge too (because the comparison operation(s) will become explicit, so you will be able to downcase only what need to be downcased without affecting the input per se).

Let's think about the generic solution. What is "non-repeating" character? It is a character whose first and last occurrences in a string are the same. So, if we iterate over the string and build some auxiliary data structure that a) keeps this first/last occurrences and b) allows its constant time lookup, we could solve the task in linear time.

Let's go then:

def  first_non_repeating_letter(str) 
  # We'd like to have a hash where keys are chars (downcased)
  # and values are arrays of [<first occurence index>, <last occurence index>]
  positions = {}

  str.each_char.with_index do |c, i|
    key = c.downcase

    if positions.key?(key)
      # We've seen it before, so all we need to do is to update its last occurrence position
      positions[key][1] = i
    else
      # This is the 1st time we met the char `c`. So we make its first
      # and last occurrence positions the same (i) 
      positions[key] = [i, i]
    end
  end

  # At this point, for the given string 'sTreSS' (for example) we would build
  # positions = {"s"=>[0, 5], "t"=>[1, 1], "r"=>[2, 2], "e"=>[3, 3]}

  # Now let's do the main job finally
  str.chars.find { |c| positions[c.downcase][0] == positions[c.downcase][1] } || ""
end

pry(main)> first_non_repeating_letter('sTreSS')
=> "T"