Home > Software engineering >  Pandas - Group by Cumulative Calculation
Pandas - Group by Cumulative Calculation

Time:06-23

I'm trying to do a calculation with cumulative operations using pandas in jupyter notebook.

But I'm stuck.

I have the following DataFrame:

Date Type deposit withdrawal
2020/01/19 A 1000 0
2020/01/23 A 500 0
2020/02/05 A 0 200
2020/02/22 A 1000 200
2020/01/19 B 859.8 0
2020/02/04 B 514.33 50
2020/02/25 B 1463.14 0

I need to apply this formula, grouping by type:

Movement = IF(ISNULL([previous_date]); [Current_Deposit] - [withdrawal];([previous_movement] * (1  ([current_date]-[previous_date]) * 0,0001))   [Current_Deposit] - [withdrawal])

Example:

Date Type deposit withdrawal Movement Formula
2020/01/19 A 1000 0 1000 IF(ISNULL([NULL]); [1000] - [0];([0] * (1 ([2020/01/19]-[NULL]) * 0,0001)) [1000] - [0])
2020/01/23 A 500 0 1500.40 IF(ISNULL([2020/01/19]); [500] - [0];([1000] * (1 ([2020/01/23]-[2020/01/19]) * 0,0001)) [500] - [0])
2020/02/05 A 0 200 1302.35 IF(ISNULL([2020/01/23]); [0] - [200];([1500.40] * (1 ([2020/02/05]-[2020/01/23]) * 0,0001)) [0] - [200])
2020/02/22 A 1000 200 2104,56 IF(ISNULL([2020/02/05]); [1000] - [200];([1302.35] * (1 ([2020/02/22]-[2020/02/05]) * 0,0001)) [1000] - [200])
2020/01/19 B 859.8 0 859.80 IF(ISNULL([NULL]); [859.8] - [0];([0] * (1 ([2020/01/19]-[NULL]) * 0,0001)) [859.8] - [0])
2020/02/04 B 514.33 50 1325.51 IF(ISNULL([2020/01/19]); [514.33] - [50];([859.80] * (1 ([2020/02/04]-[2020/01/19]) * 0,0001)) [514.33] - [50])
2020/02/25 B 1463.14 0 2791.43 IF(ISNULL([2020/02/04]); [1463.14] - [0];([1325.51] * (1 ([2020/02/25]-[2020/02/04]) * 0,0001)) [1463.14] - [0])

I'm not able to accumulate the values to put in the formula.

I need to group the calculations by type, and accumulate the values to use in the current line. I'm new to python and I'm not able to implement it.

CodePudding user response:

I assume that your dataframe is sorted by Type and Date

# silencing chained assignment warning
pd.options.mode.chained_assignment = None

df['Movement'] = 0

df_new = df
for row in df.iterrows():
    x = row[1]
    y = df_new.shift().loc[x.name]
    if x['Type'] == y['Type']:
        # computing the given days compound interest should calculate using power
        compound_interest = pow(1.0001, pd.Timedelta(pd.to_datetime(x['Date']) - pd.to_datetime(y['Date'])).days)
        # if you wish to use your formula uncomment the following line
        # compound_interest = 1   0.0001 * pd.Timedelta(pd.to_datetime(x['Date']) - pd.to_datetime(y['Date'])).days
        df_new['Movement'].loc[x.name] = compound_interest * y['Movement']   x['deposit'] - x['withdrawal']
    else:
        df_new['Movement'].loc[x.name] = x['deposit']

print(df_new)

OUTPUT

      Date Type  deposit  withdrawal    Movement
2020/01/19    A  1000.00           0 1000.000000
2020/01/23    A   500.00           0 1500.400060
2020/02/05    A     0.00         200 1302.351751
2020/02/22    A  1000.00         200 2104.567521
2020/01/19    B   859.80           0  859.800000
2020/02/04    B   514.33          50 1325.506712
2020/02/25    B  1463.14           0 2791.433062
  • Related