Assume a sequence (np.array
) of 1s and 0s. I need to construct 2-nd order Markov chain. For that, I need to count to proportion of T001, ... T111, where T001 is the number of times the sequences had a [0,0,1] occuring. Obviously I could do a for loop, but I am trying to avoid that. Ideally, the method could be extended to k-oder.
CodePudding user response:
The task can be split into two subtasks: First is creating all the subsequences of length 3, then counting all the unique subsequences. Both these tasks can be done with built in numpy
methods:
Assuming the input sequence is stored as a numpy array in a
, the first point can be achieved with
windows = np.lib.stride_tricks.sliding_window_view(a, 3)
Then we can count the distinct windows using
unique_windows, counts = np.unique(windows, axis=0, return_counts=True)
CodePudding user response:
You can use a combination of more_itertools.windowed
and collections.Counter
:
from more_itertools import windowed
from collections import Counter
import numpy as np
np.random.seed(0)
a = np.random.choice([0,1], size=100)
Counter(windowed(a, n=3))
input:
array([0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1,
1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1,
1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0])
output:
Counter({(0, 1, 1): 14,
(1, 1, 0): 14,
(1, 0, 1): 15,
(1, 1, 1): 16,
(1, 0, 0): 10,
(0, 0, 1): 10,
(0, 1, 0): 12,
(0, 0, 0): 7})
With custom keys:
{'T' ''.join(map(str, k)): v for k,v in Counter(windowed(a, n=3)).items()}
output:
{'T011': 14,
'T110': 14,
'T101': 15,
'T111': 16,
'T100': 10,
'T001': 10,
'T010': 12,
'T000': 7}
How to ensure all values are present (even if zero count):
easy solution: generate a list of all possible keys and use this to iterate
from itertools import product
possible_keys = ['T' ''.join(map(str, k)) for k in product([0,1], repeat=3)]
output:
['T000', 'T001', 'T010', 'T011', 'T100', 'T101', 'T110', 'T111']
The you can combine with the use of the dictionary get
method:
## as comprehension
{k: out.get(k, 0) for k in possible_keys}
## as classical loop
for key in possible_keys:
out.get(key, 0) # default value is 0
full example:
from itertools import product
a = [0,1,1,0]
out = {'T' ''.join(map(str, k)): v for k,v in Counter(windowed(a, n=3)).items()}
possible_keys = ['T' ''.join(map(str, k)) for k in product([0,1], repeat=3)]
out = {k: out.get(k, 0) for k in possible_keys}
output:
{'T000': 0,
'T001': 0,
'T010': 0,
'T011': 1,
'T100': 0,
'T101': 0,
'T110': 1,
'T111': 0}
dictionary updating (python ≥3.9)
from itertools import product
default = {'T' ''.join(map(str, k)): 0 for k in product([0,1], repeat=3)}
out = default | out # update default with the values in out