Why is this too slow?-CodePudding

On LeetCode, this is accepted but fails the submission because it's too slow for a very long string. What am I doing that is too slow that I should speed up?

The goal is to count the unique number of characters in a string when splitting it up as many ways as possible.

def numSplits(s):
    good_splits = 0
    string1 = ""
    string2 = ""
    repeated_chars_st1 = dict()
    repeated_chars_st2 = dict()
    for i in range(len(s)):
        string1  = s[i]
        string2 = s[i 1:]
        for char in string1:
            if char in repeated_chars_st1:
                repeated_chars_st1[char]  = 1
            else:
                repeated_chars_st1[char] = 1
        for char2 in string2:
            if char2 in repeated_chars_st2:
                repeated_chars_st2[char2]  = 1
            else:
                repeated_chars_st2[char2] = 1
        left_counter = len(repeated_chars_st1.keys())
        right_counter = len(repeated_chars_st2.keys())
        if left_counter == right_counter:
            good_splits  = 1
        
        repeated_chars_st1.clear()
        repeated_chars_st2.clear()
        
    return good_splits

print(numSplits('aacaba'))

Here is the LeetCode question:

You are given a string s.

A split is called good if you can split s into two non-empty strings sleft and sright where their concatenation is equal to s (i.e., sleft sright = s) and the number of distinct letters in sleft and sright is the same.

Return the number of good splits you can make in s.

Example 1:

Input: s = "aacaba"

Output: 2

Explanation: There are 5 ways to split aacaba and 2 of them are good.

("a", "acaba") Left string and right string contains 1 and 3 different letters respectively.
("aa", "caba") Left string and right string contains 1 and 3 different letters respectively.
("aac", "aba") Left string and right string contains 2 and 2 different letters respectively (good split).
("aaca", "ba") Left string and right string contains 2 and 2 different letters respectively (good split).
("aacab", "a") Left string and right string contains 3 and 1 different letters respectively.

Example 2:

Input: s = "abcd"

Output: 1

Explanation: Split the string as follows ("ab", "cd").

Constraints:

1 <= s.length <= 10^5
s consists of only lowercase English letters.

CodePudding user response：

Here's a fairly quick solution. Store unique chars seen for both left and right sides of the split in a set and collections.Counter, keep track of the current counts so that we don't have to make unnecessary calls to len, then iterate over the string updating the data-structures and counts when needed and break when the left is bigger than the right

def numSplits(self, s: str) -> int:
    count = 0
    left = set()
    left_count = 0
    right = Counter(s)
    right_count = len(right)
    for c in s:
        if c not in left:
            left.add(c)
            left_count  = 1
        right[c] -= 1
        if right[c] == 0:
            right_count -= 1
        if left_count == right_count:
            count  = 1
        elif left_count > right_count:
            break
    return count

CodePudding user response：

Probably Ritwick previous answer if more efficient, but if you want to learn something, you can see that you should gain a lot of performance if you reuse the dictionary from the previous iteration. If you analyze this, the dictionaries are very similar in adjacent splits, in fact, the dictionary is the same it only "moves" one character from a dict to the other.

Using the above statement another possible solution is

def numSplits(s):
    good_splits = 0
    repeated_chars_st1 = dict()
    repeated_chars_st2 = dict()

    # Put everything in dict1
    for char in s:
        if char in repeated_chars_st1:
            repeated_chars_st1[char]  =1
        else:
            repeated_chars_st1[char] = 1

    for i in range(len(s)):
        # Move the current character
        char = s[i]
        repeated_chars_st1[char] -= 1
        if repeated_chars_st1[char] <= 0:
            del repeated_chars_st1[char]

        if char in repeated_chars_st2:
            repeated_chars_st2[char]  =1
        else:
            repeated_chars_st2[char] = 1

        # Check good split as before
        left_counter = len(repeated_chars_st1.keys())
        right_counter = len(repeated_chars_st2.keys())
        if left_counter == right_counter:
            good_splits  =1
    
    
    return good_splits

CodePudding user response：

Two problems:

You are actually assigning the two substrings to variables. This is unnecessary copying.
After you move a char from string2 to string1, you recount all chars in both strings.

To be fast enough, one way is to have two hashmaps that count chars in string1 and string2.

Each iteration, you transfer one char and modify only its two counters. If a counter reaches zero or becomes non-zero, you have a new number of distinct chars.

Whenever the number of distinct chars in both counters is equal, increase the result by one and return the result in the end.

This code should make it clear. It runs in O(n).

class Solution:
    def numSplits(self, s: str) -> int:
        counter1 = {}
        counter2 = {}
        for c in s:
            if c not in counter2:
                counter2[c] = 0
            counter2[c]  = 1
        distinct1 = 0
        distinct2 = len(counter2.keys())
        result = 0
        
        for c in s:
            if distinct1 == distinct2:
                result  = 1
            
            # add c to string1
            if c not in counter1:
                distinct1  = 1
                counter1[c] = 0
            counter1[c]  = 1
            
            # remove c from string2
            counter2[c] -= 1
            if counter2[c] == 0:
                distinct2 -= 1
                del counter2[c]
        
        return result