Home > Net >  Using Hashing to count the number of occurrences of a pattern in an integer array
Using Hashing to count the number of occurrences of a pattern in an integer array

Time:10-29

So I have a problem that is ,I have an integer array and first I define an interval as a good interval iff, within the interval every integer appears an even (including zero) number of times. I want to find the number of good intervals in a given integer array. For example, if array = [7, 7, 1, 5, 5, 1], the good intervals are [1, 2], [3, 6], [4, 5], [1, 6] corresponding to the contiguous subarrays [7, 7], [1, 5, 5, 1], [5, 5], [7, 7, 1, 5, 5, 1]. If array = [4, 5, 6, 5, 4], then there are no good intervals.

I have a naive solution which would be to use 2 for loops and check for every possible interval whether there is a good interval but this takes O(n^2) time. I want to find a better solution that runs in O(nlogn) time and I feel that using hashing may give me a faster solution, the problem is I do not know how to incorporate it into my answer. I have been reading up on the rolling robin-karp hashing algorithm to give me some ideas but I think that this algorithm is not applicable to what I seek. Do you guys have any ideas for an algorithm to solve this in O(nlogn) time that uses hashing?

CodePudding user response:

Suppose your array is called A.

For each index i, you could compute the set of the elements in A[:i] that appear an odd number of times. Now your problem is equivalent to finding all i, j such that these sets are equal.

This is still O(n^2) in the worst case, but instead of using sets, you can use a hash of the sets. For efficiency, the hashes need to be incrementally computable from the hash of the previous set. One such way is to use the XOR of a (universal hash function) of the elements of the set. With this, you can add and remove single elements from the hash in O(1), and it has the benefit that adding and removing an element is exactly the same operation, making it very suitable for this problem, where the parity and not the exact count of the elements is important.

So compute this new array for indexes 0 to n inclusive:

B[0] = 0
B[i 1] = HASH(A[i]) XOR B[i]

Then count all 0<=i<j<=n such that B[i]=B[j] (which you can do in O(n) time, for example with a regular map).

This is a probabilistically correct algorithm, since if you are unlucky, a non-empty set can have zero hash. If you use a universal b-bit hash, an upper bound for the probability it's correct is approximately exp(-n²/2^(b 1)) -- obtained from the birthday problem probability. So if you use a 128-bit hash, you're pretty safe for any input you're likely to find in practice.

As examples, here's Python code that implements the naive version which uses sets and runs in O(n^2) in the worst case.

import collections

def naive_evens(A):
    B = frozenset()
    counts = collections.Counter()
    counts[B]  = 1
    total = 0
    for a in A:
        B = B.symmetric_difference({a})
        total  = counts[B]
        counts[B]  = 1
    return total

Here's the probabilistically correct version that uses hashing and runs in O(n) time. It uses HASH as a universal hash (with random seed HA), and parameters HW and HM which describe the word-size and number of bits of hash to create. To avoid hashing 0 to 0, the array elements are modified so that they're all positive (by adding something to each element so that the minimum element is always 1).

import collections
import random

HW = 256
HM = 128
HA = random.randrange(1 << HW)

def HASH(x):
    h = (HA * x) % (1 << HW)
    return h >> (HW - HM)

def smart_evens(A):
    B = 0
    counts = collections.Counter()
    counts[B]  = 1
    total = 0
    M = min(A)
    for x in A:
        B = B ^ HASH(x - M   1)
        total  = counts[B]
        counts[B]  = 1
    return total

CodePudding user response:

Linear time, linear space

Assumptions: Your integers are contiguous or at least small. If you have integers that are much bigger than the number of distinct integers, you should use a hash to give each integer a unique id: 0, 1, 2, ..., and apply my algorithm to these ids.

Initialize the following:

bitvec = 0. We'll use this to keep track of the parity of the number of times we've seen each element so far as we parse the input array.

bitvec_to_count: a hash that has a starting value of zero for each new key, and will keep track of the number of times we've seen each bitvec.

Set bitvec_to_count[0] = 1. We use this to say that we've seen the bitvec representing no elements once.

Now parse the array. For the i'th element of the array, flip the i'th bit of the bitvector (because the parity of that element has changed). Increment the count of this bitvector.

Finally, take the sum of choose(count, 2) for all the counts (value) in the bitvec_to_count hash. This is your answer.

This takes advantage of the fact that good intervals are exactly those where the parity of elements are identical just prior to the start and at the end (because the parity of the elements in the interval itself are all zeros since they all show up an even number of times).

Here's working Ruby code; should be simple to translate to Python.

def f(arr)
  bitvec = 0
  bitvec_to_count = Hash.new{|h, k| h[k] = 0}
  bitvec_to_count[bitvec]  = 1
  
  arr.each do |val|
    bitvec ^= 1 << val
    bitvec_to_count[bitvec]  = 1
  end
  
  ans = 0
  
  bitvec_to_count.values.each do |count|
    ans  = count * (count - 1) / 2 
  end
  
  return ans
end
  • Related