Smart way of counting sequences of 1s and 0s (Markov chain)-CodePudding

Assume a sequence (np.array) of 1s and 0s. I need to construct 2-nd order Markov chain. For that, I need to count to proportion of T001, ... T111, where T001 is the number of times the sequences had a [0,0,1] occuring. Obviously I could do a for loop, but I am trying to avoid that. Ideally, the method could be extended to k-oder.

CodePudding user response：

The task can be split into two subtasks: First is creating all the subsequences of length 3, then counting all the unique subsequences. Both these tasks can be done with built in numpy methods:

Assuming the input sequence is stored as a numpy array in a, the first point can be achieved with

windows = np.lib.stride_tricks.sliding_window_view(a, 3)

Then we can count the distinct windows using

unique_windows, counts = np.unique(windows, axis=0, return_counts=True)

CodePudding user response：

You can use a combination of more_itertools.windowed and collections.Counter:

from more_itertools import windowed
from collections import Counter
import numpy as np

np.random.seed(0)
a = np.random.choice([0,1], size=100)

Counter(windowed(a, n=3))

input:

array([0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1,
       1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
       0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0])

output:

Counter({(0, 1, 1): 14,
         (1, 1, 0): 14,
         (1, 0, 1): 15,
         (1, 1, 1): 16,
         (1, 0, 0): 10,
         (0, 0, 1): 10,
         (0, 1, 0): 12,
         (0, 0, 0): 7})

With custom keys:

{'T' ''.join(map(str, k)): v for k,v in Counter(windowed(a, n=3)).items()}

output:

{'T011': 14,
 'T110': 14,
 'T101': 15,
 'T111': 16,
 'T100': 10,
 'T001': 10,
 'T010': 12,
 'T000': 7}

How to ensure all values are present (even if zero count):

easy solution: generate a list of all possible keys and use this to iterate

from itertools import product
possible_keys = ['T' ''.join(map(str, k)) for k in product([0,1], repeat=3)]

output:

['T000', 'T001', 'T010', 'T011', 'T100', 'T101', 'T110', 'T111']

The you can combine with the use of the dictionary get method:

## as comprehension
{k: out.get(k, 0) for k in possible_keys}

## as classical loop
for key in possible_keys:
    out.get(key, 0) # default value is 0

full example:

from itertools import product
a = [0,1,1,0]
out = {'T' ''.join(map(str, k)): v for k,v in Counter(windowed(a, n=3)).items()}
possible_keys = ['T' ''.join(map(str, k)) for k in product([0,1], repeat=3)]
out = {k: out.get(k, 0) for k in possible_keys}

output:

{'T000': 0,
 'T001': 0,
 'T010': 0,
 'T011': 1,
 'T100': 0,
 'T101': 0,
 'T110': 1,
 'T111': 0}

dictionary updating (python ≥3.9)

from itertools import product
default = {'T' ''.join(map(str, k)): 0 for k in product([0,1], repeat=3)}
out = default | out # update default with the values in out