Group anagrams in leetcode without sorting-CodePudding

I came across this solution on Leetcode for group anagrams that doesn't use sorting. I have two questions for this solution. 1. What are we trying to do in the step where we convert sArr to string in this line - String test = Arrays.toString(sArr);I debugged and see the test string is an array of ints with value 1 for each occurence of alphabet in my input string. For eg, if my input string is eat, test prints - [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], which makes sense. But we are further also checking if this exists as a key in the map. It's really hard to follow this code. 2. What's the time complexity? Is it O(m*n) - n being the length of each string in the inner for loop?

public List<List<String>> groupAnagrams(String[] strs) {
    List<List<String>> output = new ArrayList();
    if(strs == null) {
        return output;
    }
    Map<String,List<String>> outputMap = new HashMap();

    for(String str : strs) {
        int[] input = new int[26];
        for(int i = 0; i < str.length(); i  ) {
            input[str.charAt(i) - 'a']  ;
        }
        String inputStr = Arrays.toString(input);
        if(outputMap.containsKey(inputStr)) {
            outputMap.get(inputStr).add(str);
        } else {
            List<String> outputLst = new ArrayList();
            outputLst.add(str);
            outputMap.put(inputStr, outputLst);
        }
    }
    output.addAll(outputMap.values());
    return output;
}

CodePudding user response：

Map<String,List<String>> outputMap = new HashMap();

Output key is [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]. This a String representation of the array where each index represents the occurrence of each letter starting from a, b, c to z.

For any anagram, number of occurrence of all possible english letters will be same. So string representation of the array will be same. That's why outputMap.containsKey(inputStr) suggests that anagram is already present for that same letter combinations & adds the new one to the existing list.

Above should answer your first question.

Time complexity is indeed O(m * n) where m is the number of Strings in the input list & n being the average letter count of each String. There is one loop which runs 1 to m times. Then we have a inner loop which runs n times (number of letters) for each String. So it is (m * n).

All other Map & List operations in the solution take O(1) time. All these Map & List operations run m times (running as part of outer loop). So total time taken in worst scenario :

loop: (m * n)
map.containsKey() m times: m * O(1) = m
list.add() m times: m * O(1) = m
map.put() m times: m * O(1) = m

(m * n )   m   m   m = (m * n)   3m = m (n   3)

So if we remove the constant part, time complexity becomes O(m * n).

CodePudding user response：

For your first question, "What are we trying to do in the step where we convert sArr to string in this line - String test = Arrays.toString(sArr);", I am assuming you're referring to the line that converts input into inputStr.

Basically, what it is doing there, is the input array represents an array counting each of the letters within the words, represented 0 - 25, a - z.

It gets turned into a String by using Arrays.toString(input) so that any anagrams will be represented by a string of the same pattern of characters. e.g. "abc" would be equivalent to [1, 1, 1,...], which would match either of "abc", "cba", "cab", "bca", or "bac". (Not sure if I missed one).

As to your second question, "What's the time complexity?", I'm not 100% sure, as time complexity isn't my strong suit, but it appears to be right. A similar leetcode solution, I found was this, which is quite similar to yours, and is O(m*n).

CodePudding user response：

Arrays.toString() Returns a string representation of the contents of the specified array. In your case, it encodes an input to record each characters'occurences in it. Take eat as an example, its encoding is [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], while ate has the same encoding. They are angrams. OutputMap's key is an encoding and the corresponding values are inputs which have the same encoding, in other words, anagrams. We further check if this exists as a key in the map to avoid adding input to an null list and throwing exception. As a result, each entry in the OutputMap is an angram group.

We need to travese each input of length n to encode it. It costs O(n) time. And each loop contains m input. So total time complexity is O(m*n).