Home > Enterprise >  Is there a method to aggregate time series in pandas based on the sequential count of an occurrence?
Is there a method to aggregate time series in pandas based on the sequential count of an occurrence?

Time:12-17

I am looking for a way in pandas to count the number of sequential of occurrences of a particular value in a time series.

Suppose I am performing an experiment where I flip a coin and get heads or tails (1 or 0). I record my results in a pandas series, and I wish to see how many instances (a count) I had with two sequential heads, three sequential heads, four sequential heads, and so on. Moreover, I wish it to be something of a rolling count, meaning that a sequence of the form (tails, heads, heads, heads, tails) will return a count of two instances of heads occurring in pairs, and a single count of a series of three heads.

Is there a natural way to do this with methods in a Series/DataFrame? I could do it with some for loops, but I am concerned about the cost of that.

Thanks.

Edit: requested input/output.

Input:

a = pd.DataFrame({'coin' : [0,1,1,1,0]})
print(a.summary_of_windows())

Output:

{1: 3
 2: 2,
 3: 1}

The output could be a dictionary: the key 1 means heads occurrences, of which three occurred. Key 2 means pairs of sequential heads (there are two of those), and Key 3 means sequences of length 3 of heads (happened once).

CodePudding user response:

You can use DataFrame.rolling:

>>> df
   coin
0     0
1     1
2     1
3     1
4     0

# Compute how many sequences of two heads there are:
>>> df['coin'].rolling(2).sum().eq(2).sum()
2

# Do it for three sequences:
#   remember to change v    AND    v
>>> df['coin'].rolling(3).sum().eq(3).sum()
1

# Find total number of heads occurences:
>>> df['coin'].sum()
3
  • Related