I would like to sum rolling unique values with same window count.
as example if if have values 20,30,30,40 i want sum of (20,30,40)
CodePudding user response:
Pandas offers a function called rolling for this. Here is an adapted example from the docs:
import pandas as pd
series = pd.Series([20, 30, 40, 50, 60])
series.rolling(3).sum()
Output:
0 NaN
1 NaN
2 90.0
3 120.0
4 150.0
dtype: float64
CodePudding user response:
You can aggregate consecutive groups of equal numbers and then apply a rolling sum to the first element of each:
# example dataframe
df = pd.DataFrame({'a':[20, 20, 30, 30, 40, 40, 50, 60]})
# splits into groups
grouping = (df['a']!=df['a'].shift()).cumsum()
# groupby and select first of each group, then apply rolling sum
df.groupby(grouping).agg({'a':'first'}).rolling(3).sum()
output:
a
a
1 NaN
2 NaN
3 90.0
4 120.0
5 150.0
CodePudding user response:
If the duplicates are grouped like your example you can try drop the duplicates in your dataframe using df.drop_duplicates() then apply .rolling(3).sum() to the new dataframe without any repeated values.
series = pd.Series([20, 30, 30,30,40, 50,50 , 60])
unique_series = series.drop_duplicates()
unique_series.rolling(3,min_periods=1).sum()
After seeing pieterbargs response above I tried the following:
df = pd.DataFrame({
'value': [10,20, 30, 50,50,50, 70,80, 90,90],
'id': [1,2,3,4,5,6,7,8,9,10],
})
grouping = (df['value']!=df['value'].shift())
df2 = df[grouping].rolling(3).sum()['value'].rename('sum')
df = df.merge(df2,how='left',left_index=True,right_index=True)
The output is as follows:
value id sum
0 10 1
1 20 2
2 30 3 60.0
3 50 4 100.0
4 50 5
5 50 6
6 70 7 150.0
7 80 8 200.0
8 90 9 240.0
9 90 10
You can use .fillna(method = 'ffill') to fill the values down if you want this.
df['sum'] = df['sum'].fillna(method = 'ffill')
Gives an output as follows:
value id sum
0 10 1
1 20 2
2 30 3 60.0
3 50 4 100.0
4 50 5 100.0
5 50 6 100.0
6 70 7 150.0
7 80 8 200.0
8 90 9 240.0
9 90 10 240.0