Home > OS >  pandas cumsum on lag-differenced dataframe
pandas cumsum on lag-differenced dataframe

Time:08-11

Say I have a pd.DataFrame() that I differenced with .diff(5), which works like "new number at idx i = (number at idx i) - (number at idx i-5)"

import pandas as pd
import random
example_df = pd.DataFrame(data=random.sample(range(1, 100), 20), columns=["number"])
df_diff = example_df.diff(5)

Now I want to undo this operation using the first 5 entries of example_df, and using df_diff.

If i had done .diff(1), I would simply use .cumsum(). But how can I achieve that it only sums up every 5th value?

My desired output is a df with the following values:

df_example[0]
df_example[1]
df_example[2]
df_example[3]
df_example[4]
df_diff[5]   df_example[0]
df_diff[6]   df_example[1]
df_diff[7]   df_example[2]
df_diff[8]   df_example[3]
...

CodePudding user response:

you could shift the column, add them and fill nans:

df_diff["shifted"] = example_df.shift(5)
df_diff["undone"] = df_diff["number"]   df_diff["shifted"]
df_diff["undone"] = df_diff["undone"].fillna(example_df["number"])
  • Related