Home > database >  How to count the unique values in a sliding window?
How to count the unique values in a sliding window?

Time:10-18

Let's say that we have a DataFrame that simply contains the outcome of tossing a coin N times.

    outcome
0     H
1     T
2     H
3     H
4     H
5     T
6     H 

For our example, let's suppose that we'd like to examine a sliding window of 3 and we'd like to count how many times each window (order preserved) appeared in the dataset.

The sliding windows of 3 in the dataset are:

  • H-T-H
  • T-H-H
  • H-H-H
  • H-H-T
  • H-T-H

So the value counts would be:

H-T-H 2
T-H-H 1
H-H-H 1
H-H-T 1

I have thought of concating 3 sliding rows to create the windows as a string representation and then doing a value counts there. Is that a valid approach? Or is there a more pandas-oriented way?

CodePudding user response:

You approach is valid but might not be efficient for large arrays as string concatenation/aggregation is expensive.

You could use here to benefit from the boolean-like aspect of your data:

from numpy.lib.stride_tricks import sliding_window_view as swv

a = swv(df['outcome'].eq('H'), 3)
vals, counts = np.unique(a, return_counts=True, axis=0)

out = pd.Series(counts, index=np.where(vals, 'H', 'T'))

output:

(T, H, H)    1
(H, T, H)    2
(H, H, T)    1
(H, H, H)    1
dtype: int64

CodePudding user response:

try this :

(data["outcome"] "-" data["outcome"].shift(1) "-" data["outcome"].shift(2)).dropna().value_counts()
  • Related