How do I optimize this xor sum algorithm?-CodePudding

I'm trying to solve this hackerrank problem https://www.hackerrank.com/challenges/xor-subsequence/problem

from functools import reduce

def xor_sum(arr):
    return reduce(lambda x,y: x^y, arr)    

def xorSubsequence(arr):
    freq = {}
    max_c = float("-inf") # init val
    min_n = float("inf") # init val
    
    for slice_size in range(1, len(arr) 1):
        for step in range(0, len(arr) 1-slice_size):
            n = xor_sum(arr[i] for i in range(step,step slice_size))
            
            freq[n] = freq.get(n,0) 1
            if freq[n] >= max_c and (n < min_n or freq[n]> max_c):
                min_n = n
                max_c = freq[n]

    return  min_n, freq[min_n]

But it times out since it's ~O(n^3). I feel like there is some math trick, can someone explain the solution to me? I tried to read some solutions in the discussion but I didn't quite get them.

Problem copy:

Consider an array, A , of n integers (A=a_0,a_1,...,a_n-1). We take all consecutive subsequences of integers from the array that satisfy the following: {a_i,a_i 1,...a_j-1,a_j}, where 0<=i<=j<=n

For each subsequence, we apply the bitwise XOR () operation on all the integers and record the resultant value.

Given array A, find the XOR sum of every subsequence of A and determine the frequency at which each number occurs. Then print the number and its respective frequency as two space-separated values on a single line.

Output Format

Print 2 space-separated integers on a single line. The first integer should be the number having the highest frequency, and the second integer should be the number's frequency (i.e., the number of times it appeared). If there are multiple numbers having maximal frequency, choose the smallest one.

Constraints

1<=n<=10^5

1<=a_i<=2^16

CodePudding user response：

Note that your algorithms is not O(n²) but O(n³), but you can reduce it to O(n²) by just xor-ing each new number to all the results from the last "slice" (plus that number itself, starting a new subsequence).

from collections import Counter
def xor_sub_sum(arr):
    freq = Counter()
    last = []
    for x in arr:
        last = [x, *(x^y for y in last)]
        freq.update(last)
    # "most_common" does not consider smallest-key constraint...
    return max(freq.items(), key=lambda t: (t[1], -t[0]))

On my machine, this reduces the execution time for 1000 elements from 21.7 seconds to just 0.05 seconds. For 10000, it still takes ~5 seconds, though.

CodePudding user response：

Okay, I still don't fully understand it, but I was able to pass all the tests by implementing an O(n log(n)) fwht() based on Wikipedia's description, plus pulling from another existing solution:

from collections import Counter
from itertools import accumulate
from operator import xor


def fwht(arr):
    # https://en.wikipedia.org/wiki/Fast_Walsh–Hadamard_transform
    if len(arr) == 1:
        return arr

    prefix, suffix = arr[:len(arr) // 2], arr[len(arr) // 2:]

    new_prefix = fwht([p   s for p, s in zip(prefix, suffix)])
    new_suffix = fwht([p - s for p, s in zip(prefix, suffix)])
    return new_prefix   new_suffix


def xorSubsequence(seq):
    next_pow2 = 2**(len(seq) - 1).bit_length()

    histogram = Counter(accumulate([0]   seq, xor))
    histogram = [histogram[value] for value in range(next_pow2)]

    histogram = [x * x for x in fwht(histogram)]
    histogram = [y // next_pow2 for y in fwht(histogram)]

    histogram[0] -= len(seq)   1             # self combos (diagonal in table)
    histogram = [y // 2 for y in histogram]  # don't count things twice

    max_freq = max(histogram)
    return next((i, freq) for i, freq in enumerate(histogram) if freq == max_freq)