Let's say that we have a DataFrame that simply contains the outcome of tossing a coin N times.
outcome
0 H
1 T
2 H
3 H
4 H
5 T
6 H
For our example, let's suppose that we'd like to examine a sliding window of 3 and we'd like to count how many times each window (order preserved) appeared in the dataset.
The sliding windows of 3 in the dataset are:
- H-T-H
- T-H-H
- H-H-H
- H-H-T
- H-T-H
So the value counts would be:
H-T-H 2
T-H-H 1
H-H-H 1
H-H-T 1
I have thought of concating 3 sliding rows to create the windows as a string representation and then doing a value counts there. Is that a valid approach? Or is there a more pandas-oriented way?
CodePudding user response:
You approach is valid but might not be efficient for large arrays as string concatenation/aggregation is expensive.
You could use numpy here to benefit from the boolean-like aspect of your data:
from numpy.lib.stride_tricks import sliding_window_view as swv
a = swv(df['outcome'].eq('H'), 3)
vals, counts = np.unique(a, return_counts=True, axis=0)
out = pd.Series(counts, index=np.where(vals, 'H', 'T'))
output:
(T, H, H) 1
(H, T, H) 2
(H, H, T) 1
(H, H, H) 1
dtype: int64
CodePudding user response:
try this :
(data["outcome"] "-" data["outcome"].shift(1) "-" data["outcome"].shift(2)).dropna().value_counts()