Home > front end >  Pandas - revert a cumsum with NaN values
Pandas - revert a cumsum with NaN values

Time:11-09

Is there a way to get original column back from column which is a cumsum() of the original column?

For example:

df = pd.DataFrame({'Original': [1, 0, 0, 1, 0, 5, 0, np.NaN, np.NaN,4, 0, 0],
                   'CumSum': [1, 1, 1, 2, 2, 7, 7, np.NaN, np.NaN, 11, 11, 11]})

In the above example df, Is it possible to get original column just using the CumSum column?

In my original dataset, I have a column similar to CumSum column and I want to get the original. I tried to find an inbuilt function that can do but haven't found any.

CodePudding user response:

You can use:

df['Original2'] = (df['CumSum'].ffill().diff()
                   .mask(df['CumSum'].isna())
                   .fillna(df['CumSum'])
                  )

Output:

    Original  CumSum  Original2
0        1.0     1.0        1.0
1        0.0     1.0        0.0
2        0.0     1.0        0.0
3        1.0     2.0        1.0
4        0.0     2.0        0.0
5        5.0     7.0        5.0
6        0.0     7.0        0.0
7        NaN     NaN        NaN
8        NaN     NaN        NaN
9        4.0    11.0        4.0
10       0.0    11.0        0.0
11       0.0    11.0        0.0
  • Related