I am looking for a way in pandas to count the number of sequential of occurrences of a particular value in a time series.
Suppose I am performing an experiment where I flip a coin and get heads or tails (1 or 0). I record my results in a pandas series, and I wish to see how many instances (a count) I had with two sequential heads, three sequential heads, four sequential heads, and so on. Moreover, I wish it to be something of a rolling count, meaning that a sequence of the form (tails, heads, heads, heads, tails) will return a count of two instances of heads occurring in pairs, and a single count of a series of three heads.
Is there a natural way to do this with methods in a Series/DataFrame? I could do it with some for loops, but I am concerned about the cost of that.
Thanks.
Edit: requested input/output.
Input:
a = pd.DataFrame({'coin' : [0,1,1,1,0]})
print(a.summary_of_windows())
Output:
{1: 3
2: 2,
3: 1}
The output could be a dictionary: the key 1 means heads occurrences, of which three occurred. Key 2 means pairs of sequential heads (there are two of those), and Key 3 means sequences of length 3 of heads (happened once).
CodePudding user response:
You can use DataFrame.rolling
:
>>> df
coin
0 0
1 1
2 1
3 1
4 0
# Compute how many sequences of two heads there are:
>>> df['coin'].rolling(2).sum().eq(2).sum()
2
# Do it for three sequences:
# remember to change v AND v
>>> df['coin'].rolling(3).sum().eq(3).sum()
1
# Find total number of heads occurences:
>>> df['coin'].sum()
3