Home > Net >  Rolling Pandas unique values with same window size
Rolling Pandas unique values with same window size

Time:03-01

I would like to sum rolling unique values with same window count.

as example if if have values 20,30,30,40 i want sum of (20,30,40)

enter image description here

CodePudding user response:

Pandas offers a function called rolling for this. Here is an adapted example from the docs:

import pandas as pd

series = pd.Series([20, 30, 40, 50, 60])
series.rolling(3).sum()

Output:

0      NaN
1      NaN
2     90.0
3    120.0
4    150.0
dtype: float64

CodePudding user response:

You can aggregate consecutive groups of equal numbers and then apply a rolling sum to the first element of each:

# example dataframe
df = pd.DataFrame({'a':[20, 20, 30, 30, 40, 40, 50, 60]})

# splits into groups
grouping = (df['a']!=df['a'].shift()).cumsum()

# groupby and select first of each group, then apply rolling sum
df.groupby(grouping).agg({'a':'first'}).rolling(3).sum()

output:


    a
a   
1   NaN
2   NaN
3   90.0
4   120.0
5   150.0

CodePudding user response:

If the duplicates are grouped like your example you can try drop the duplicates in your dataframe using df.drop_duplicates() then apply .rolling(3).sum() to the new dataframe without any repeated values.

series = pd.Series([20, 30, 30,30,40, 50,50 , 60])
unique_series = series.drop_duplicates()
unique_series.rolling(3,min_periods=1).sum()

After seeing pieterbargs response above I tried the following:

df = pd.DataFrame({
    'value': [10,20, 30, 50,50,50, 70,80, 90,90],
    'id': [1,2,3,4,5,6,7,8,9,10],
})


grouping = (df['value']!=df['value'].shift()) 
df2 = df[grouping].rolling(3).sum()['value'].rename('sum')
df = df.merge(df2,how='left',left_index=True,right_index=True)

The output is as follows:

value   id  sum
0   10  1   
1   20  2   
2   30  3   60.0
3   50  4   100.0
4   50  5   
5   50  6   
6   70  7   150.0
7   80  8   200.0
8   90  9   240.0
9   90  10  

You can use .fillna(method = 'ffill') to fill the values down if you want this.

df['sum'] = df['sum'].fillna(method = 'ffill')

Gives an output as follows:

   value    id  sum
0   10  1   
1   20  2   
2   30  3   60.0
3   50  4   100.0
4   50  5   100.0
5   50  6   100.0
6   70  7   150.0
7   80  8   200.0
8   90  9   240.0
9   90  10  240.0
  • Related