Home > database >  Counting subsequences in python
Counting subsequences in python

Time:02-22

  1. Counting Subsequences

A sequence of numbers is said to be good if it satisfies the following two conditions:

  1. All elements in the sequence are unique.
  2. If the minimum element in the sequence is a, and the maximum element in the sequence is b, then all numbers in the range [a, b] are present in the sequence. For example, (4, 2, 5, 1, 3) is a good sequence, while (2, 2, 1) or (3, 7) are not. A subsequence of an array arr is a sequence that can be derived from the array arr by deleting some or no elements without changing the order of the remaining elements.

Given an array of n integers, find the number of different good subsequences. Two subsequences are considered different if they include at least one different index. For example, for the sequence (2, 2, 1), both (2, 1) formed by indices 1 and 2 and (2, 1), formed by indices 0 and 2 are considered different subsequences. Example Consider the array arr = [13, 11, 4, 12, 5, 4]. We can form the following good subsequences: • Subsequences of length 1: [13], [11], [4], [12], [5], [4],

• Subsequences of length 2: [13, 12], [11, 12], [4, 5], [5, 4]

• Subsequences of length 3: [13, 11, 12] Thus, the answer is 6 4 1 = 11. Function Description Complete the function countGoodSubsequences in the editor below. countGoodSubsequences has the following parameter: int arr[n]: the given array of integers Returns long int: the number of good subsequences which can be derived from the array,

this is my code:

import itertools
def is_good_sequence(array):
    minimum = min(array)
    maximum = max(array)
    good_sequence_list = list(range(minimum, maximum 1))
    checked = []
    if sorted(array) != good_sequence_list:
        return False

    for i in array:
        if i in checked:
            return False
        else:
            checked.append(i)

    return True


def countGoodSubsequences(arr):
    good_sequences = []
    for i in range(1, len(arr) 1):
        t = list(itertools.permutations(arr, i))
        for j in t:
            if is_good_sequence(j):
                good_sequences.append(j)
    return good_sequences

however it doesn't return the excepted answer

CodePudding user response:

In order to tackle your problem, you can use the build in set to get unique numbers and sorted to get consecutive numbers.

numbers =  [13, 11, 4, 12, 5, 4]

unique_numbers = sorted(set(numbers))
gaps = [
    i for i in range(1, len(unique_numbers)) 
    if unique_numbers[i] - unique_numbers[i-1] > 1
]

consecuteves = []
i0 = i1 = 0
for i1 in gaps:
    consecuteves.append(tuple(unique_numbers[i0:i1]))
    i0 = i1
consecuteves.append(tuple(unique_numbers[i1:len(unique_numbers)]))
consecuteves

From this point you can use a recursive function to split the consecutive numbers in smaller chunks.

A possible recursion could look like this

def split(consecutive):
    result = []
    c1 = consecutive[:-1]
    c2 = consecutive[1:]
    if len(consecutive) > 1:
        result.extend([c1, c2])
    if len(consecutive) > 2:
        sub_1 = split(c1)
        sub_2 = split(c2)
        if sub_1:
            result.extend(sub_1)
        if sub_2:
            result.extend(sub_2)
    return result

so you could get your chunks as follows:

result = set()
for consecutive in consecuteves:
    result.update(split(consecutive))
sorted(result, key=lambda s: (len(s), s))

However, note that if you only need the number of good sequences, you do not need to create them as I do above, you can just calculate the number of sub-sequences for consecutive numbers. I leave this combinatoric exercise for you.

Since I love mathematical riddles, I got your solution:

from collections import defaultdict

def frequency_count(sequence):
    frequency = defaultdict(int)
    for n in sequence:
        frequency[n]  = 1
    return sorted(frequency.items())

def split(frequency):
    sequence = [n for n, m in frequency]
    gaps = [i for i in range(1, len(sequence)) if sequence[i] - sequence[i-1] > 1]
    i0 = i1 = 0
    for i1 in gaps:
        yield frequency[i0:i1]
    yield frequency[i1:len(frequency)]

def count_unique_consecutive(n):
    return n*(n 1)//2

def _count_subsets(frequency):
    for i, (n, m) in enumerate(frequency):
        if m > 1:
            f_new = frequency.copy()
            f_new[i] = (n, 1)
            n_single = _count_subsets(f_new)
            n_left = count_unique_consecutive(i)
            n_right = _count_subsets(frequency[i 1:])
            return m*n_single - (m-1)*n_left - (m-1)*n_right
    return count_unique_consecutive(len(frequency))

def count_subsets(sequence):
    frequency = frequency_count(sequence)
    consecutive = split(frequency)
    return sum(map(_count_subsets, consecutive))

count_subsets([1,2,2,2,3,8,9,10])

The combinatoric part, counting subsets in a consecutive sequences, is basically the second diagonal of Pascal's triangle. I leave it to you to understand why. Counting frequencies is done with a default dictionary. In order to take care of multiplicity, you need to multiply the number of possible subsets where the multiple value appears. This is done by multiplying the entire consecutive range and subtracting the subsets left and right which do not overlap with the multi-count number so they only count once.

  • Related