Count number of wonderful substrings-CodePudding

I found below problem in one website.

A wonderful string is a string where at most one letter appears an odd number of times.
For example, "ccjjc" and "abab" are wonderful, but "ab" is not.
Given a string word that consists of the first ten lowercase English letters ('a' through 'j'), return the number of wonderful non-empty substrings in word. If the same substring appears multiple times in word, then count each occurrence separately.
A substring is a contiguous sequence of characters in a string.
Example 1 :
Input: word = "aba"
Output: 4
Explanation: The four wonderful substrings are  a , b , a(last character) , aba.

I tried to solve it. I implemented a O(n^2) solution (n is input string length). But expected time complexity is O(n). I could not solve it in O(n). I found below solution but could not understood it. Can you please help me to understand below O(n) solution for this problem or come up with an O(n) solution? My O(N^2) approach - for every substring check whether it has at most one odd count char. This check can be done in O(1) time using an 10 character array.

class Solution {
public:
    long long wonderfulSubstrings(string str) {
      long long ans=0;
      int idx=0;  long long xorsum=0;
      unordered_map<long long,long long>mp;
      mp[xorsum]  ;
      while(idx<str.length()){
         xorsum=xorsum^(1<<(str[idx]-'a'));
          // if xor is repeating it means it is having even ouccrences of all elements
          // after the previos ouccerence of xor.
         if(mp.find(xorsum)!=mp.end())
             ans =mp[xorsum];
          mp[xorsum]  ;
          // if xor will have at most 1 odd character than check by xoring with (a to j)
          // check correspondingly in the map
          for(int i=0;i<10;i  ){
              long long temp=xorsum;
              temp=temp^(1<<i);
              if(mp.find(temp)!=mp.end())
                  ans =mp[temp];
          }
          idx  ;
      }
      return ans;
    }
};

CodePudding user response：

There's two main algorithmic tricks in the code, bitmasks and prefix-sums, which can be confusing if you've never seen them before. Let's look at how the problem is solved conceptually first.

For any substring of our string S, we want to count the number of appearances for each of the 10 possible letters, and ask if each number is even or odd.

For example, with a substring s = accjjc, we can summarize it as: odd# a, even# b, odd# c, even# d, even# e, even# f, even# g, even# h, even# i, even# j. This is kind of long, so we can summarize it using a bitmask: for each letter a-j, put a 1 if the count is odd, or 0 if the count is even. This gives us a 10-digit binary string, which is 1010000000 for our example.

You can treat this as a normal integer (or long long, depending on how ints are represented). When we see another character, the count flips whether it was even or odd. On bitmasks, this is the same as flipping a single bit, or an XOR operation. If we add another 'a', we can update the bitmask to start with 'even# a' by XORing it with the number 1000000000.

We want to count the number of substrings where at most one character count is odd. This is the same as counting the number of substrings whose bitmask has at most one 1. There are 11 of these bitmasks: the ten-zero string, and each string with exactly one 1 for each of the ten possible spots. If you interpret these as integers, the last ten strings are the first ten powers of 2: 1<<0, 1<<1, 1<<2, ... 1<<9.

Now, we want to count the bitmasks for all substrings in O(n) time. First, solve a simpler problem: count the bitmasks for just all prefixes, and store these counts in a hashmap. We can do this by keeping a running bitmask from the start, and performing updates by an XOR of the bit corresponding to that letter: xorsum=xorsum^(1<<(str[idx]-'a')). This can clearly be done in a single, O(n) time pass through the string.

How do we get counts of arbitrary substrings? The answer is prefix-sums: the count of letters in any substring can be expressed as a different of two prefix-counts. For example, with s = accjjc, suppose we want the bitmask corresponding to the substring 'jj'. This substring can be written as the difference of two prefixes: 'jj' = 'accjj' - 'acc'.

In the same way, we want to subtract the counts for the two prefix strings. However, we only have the bitmasks telling us whether each letter has an even or odd frequency. In the arithmetic of bitmasks, we treat each position mod 2, so coordinate-wise subtraction becomes XOR.

This means counts(jj) = counts(accjj) - counts(acc) becomes bitmask(jj) = bitmask(accjj) ^ bitmask(acc).

There's still a problem: the algorithm I've described is still quadratic. If, at every prefix, we iterate over all previous prefix-bitmasks and check if our mask XOR the old mask is one of the 11 goal-bitmasks, we still have a quadratic runtime. Instead, you can use the fact that XOR is its own inverse: if a ^ b = c, then b = a ^ c. So, instead of doing XORs with old prefix masks, you XOR with the 11 goal masks and add the number of times we've seen that mask: ans =mp[xorsum] counts the substrings ending at our current index whose bitmask is xorsum ^ 0000000000 = xorsum. The loop after that counts substrings whose bitmask is one of the ten goal bitmasks. Lastly, you just have to add your current prefix-mask to update the counts: mp[xorsum] .