I have a Timeseries dataframe with different values:
ID TimeString value1 value2 StampDif
0 2021-02-10 17:30:39 0.5 5.2 NaT
1 2021-02-10 17:33:39 0.7 5.5 0 days 00:03:00
2 2021-02-10 17:36:40 0.9 5.5 0 days 00:03:01
3 2021-02-10 17:39:40 0.6 5.4 0 days 00:03:00
4 2021-02-10 17:42:40 0.8 5.0 0 days 00:00:01
.
.
.
Now I want to merge all rows that have a Stampdif of 1 second with the previous row using the mean. I have tried:
secdf = df[df["StampDif"] <= pd.Timedelta(1, "sec")]
for idx, row in secdf.iterrows():
df.iloc[idx-1, dfnanpv.columns != ["TimeString", "StampDif"]] = df.iloc[idx-1:idx 1].mean(axis=0)
But it throws and error: 'Shapes must match', (25,), (2,)
As a result, I want the following for example in row 3:
ID TimeString value1 value2 StampDif
3 2021-02-10 17:39:40 0.7 5.2 0 days 00:03:00
CodePudding user response:
Try this:
exclude_columns = ['col1', 'col2']
new_df = df.groupby((df['StampDif'] > pd.Timedelta(1, 'second')).cumsum()).agg({col: 'mean' for col in df.columns.difference(exclude_columns)}).reset_index(drop=True)
Output:
>>> new_df
TimeString value1 value2
0 2021-02-10 17:30:39 0.5 5.2
1 2021-02-10 17:33:39 0.7 5.5
2 2021-02-10 17:36:40 0.9 5.5
3 2021-02-10 17:41:10 0.7 5.2