Home > Back-end >  Calculate base effect based on month to month changes in Pandas
Calculate base effect based on month to month changes in Pandas

Time:10-10

Let's say we have a sample dataframe df which is CPI data having date and MoM columns, I need to calculate base_effect and new_price_effect based on them:

       date    MoM  base_effect  new_price_effect
0   2019-01  1.010          NaN               NaN
1   2019-02  1.010          NaN               NaN
2   2019-03  1.010          NaN               NaN
3   2019-04  1.010          NaN               NaN
4   2019-05  1.010          NaN               NaN
5   2019-06  1.010          NaN               NaN
6   2019-07  1.010          NaN               NaN
7   2019-08  1.010          NaN               NaN
8   2019-09  1.010          NaN               NaN
9   2019-10  1.010          NaN               NaN
10  2019-11  1.010          NaN               NaN
11  2019-12  1.010          NaN               NaN
12  2020-01  1.015     1.115668          1.015000
13  2020-02  1.015     1.104622          1.030225
14  2020-03  1.015     1.093685          1.045678
15  2020-04  1.015     1.082857          1.061364
16  2020-05  1.015     1.072135          1.077284
17  2020-06  1.015     1.061520          1.093443
18  2020-07  1.015     1.051010          1.109845
19  2020-08  1.015     1.040604          1.126493
20  2020-09  1.015     1.030301          1.143390
21  2020-10  1.015     1.020100          1.160541
22  2020-11  1.015     1.010000          1.177949
23  2020-12  1.015     1.000000          1.195618

Or:

df = pd.DataFrame([{'date': '2019-01', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-02', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-03', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-04', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-05', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-06', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-07', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-08', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-09', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-10', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-11', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2019-12', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
 {'date': '2020-01',
  'MoM': 1.015,
  'base_effect': 1.115668347,
  'new_price_effect': 1.015},
 {'date': '2020-02',
  'MoM': 1.015,
  'base_effect': 1.104622125,
  'new_price_effect': 1.030225},
 {'date': '2020-03',
  'MoM': 1.015,
  'base_effect': 1.093685273,
  'new_price_effect': 1.045678375},
 {'date': '2020-04',
  'MoM': 1.015,
  'base_effect': 1.082856706,
  'new_price_effect': 1.061363551},
 {'date': '2020-05',
  'MoM': 1.015,
  'base_effect': 1.072135352,
  'new_price_effect': 1.077284004},
 {'date': '2020-06',
  'MoM': 1.015,
  'base_effect': 1.061520151,
  'new_price_effect': 1.093443264},
 {'date': '2020-07',
  'MoM': 1.015,
  'base_effect': 1.05101005,
  'new_price_effect': 1.109844913},
 {'date': '2020-08',
  'MoM': 1.015,
  'base_effect': 1.04060401,
  'new_price_effect': 1.126492587},
 {'date': '2020-09',
  'MoM': 1.015,
  'base_effect': 1.030301,
  'new_price_effect': 1.143389975},
 {'date': '2020-10',
  'MoM': 1.015,
  'base_effect': 1.0201,
  'new_price_effect': 1.160540825},
 {'date': '2020-11',
  'MoM': 1.015,
  'base_effect': 1.01,
  'new_price_effect': 1.177948937},
 {'date': '2020-12',
  'MoM': 1.015,
  'base_effect': 1.0,
  'new_price_effect': 1.195618171}])

The calculation of base_effect is based on the following formula:

enter image description here

ie: to calculate base effect of 2020-06, we use MoM(2019-07) * MoM(2019-08) * MoM(2019-09) * MoM(2019-10) * MoM(2019-11) * MoM(2019-12) = 1.061520, for 2020-09, MoM(2019-10) * MoM(2019-11) * MoM(2019-12) = 1.030301

The new_price_effect is based on:

enter image description here

ie: to calculate new price effect of 2020-04, we use MoM(2020-01) * MoM(2020-02) * MoM(2020-03) * MoM(2020-04) = 1.061364; for 2020-06, MoM(2020-01) * MoM(2020-02) * MoM(2020-03) * MoM(2020-04) * MoM(2020-05) * MoM(2020-06) = 1.093443, etc.

We can see for base_effect, the MoM of December of previous year is fixed, while for new_price_effect, the MoM of January of current year is fixed. And the base effect of for each December is always 1.00.

How could I calculate them with Pandas? Thanks.

CodePudding user response:

Try:

from dateutil.relativedelta import relativedelta
import pandas as pd
from numpy import nan
from datetime import datetime

data = [{'date': '2019-01', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-02', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-03', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-04', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-05', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-06', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-07', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-08', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-09', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-10', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-11', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2019-12', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
        {'date': '2020-01',
         'MoM': 1.015,
         'base_effect': 1.115668347,
         'new_price_effect': 1.015},
        {'date': '2020-02',
         'MoM': 1.015,
         'base_effect': 1.104622125,
         'new_price_effect': 1.030225},
        {'date': '2020-03',
         'MoM': 1.015,
         'base_effect': 1.093685273,
         'new_price_effect': 1.045678375},
        {'date': '2020-04',
         'MoM': 1.015,
         'base_effect': 1.082856706,
         'new_price_effect': 1.061363551},
        {'date': '2020-05',
         'MoM': 1.015,
         'base_effect': 1.072135352,
         'new_price_effect': 1.077284004},
        {'date': '2020-06',
         'MoM': 1.015,
         'base_effect': 1.061520151,
         'new_price_effect': 1.093443264},
        {'date': '2020-07',
         'MoM': 1.015,
         'base_effect': 1.05101005,
         'new_price_effect': 1.109844913},
        {'date': '2020-08',
         'MoM': 1.015,
         'base_effect': 1.04060401,
         'new_price_effect': 1.126492587},
        {'date': '2020-09',
         'MoM': 1.015,
         'base_effect': 1.030301,
         'new_price_effect': 1.143389975},
        {'date': '2020-10',
         'MoM': 1.015,
         'base_effect': 1.0201,
         'new_price_effect': 1.160540825},
        {'date': '2020-11',
         'MoM': 1.015,
         'base_effect': 1.01,
         'new_price_effect': 1.177948937},
        {'date': '2020-12',
         'MoM': 1.015,
         'base_effect': 1.0,
         'new_price_effect': 1.195618171}]

df = pd.DataFrame.from_dict(data)
df['date'] = pd.to_datetime(df['date'])
print(df)


def base_effect(x):
    required_interval = pd.date_range(x['date'] - relativedelta(months=11), freq='MS',
                                      periods=12 - x['date'].month)
    df_sub = df[df.date.isin(required_interval)]
    return df_sub['MoM'].product(min_count=1)


def new_price_effect(x):
    required_interval = pd.date_range(datetime(x['date'].year, 1, 1), freq='MS', periods=x['date'].month)
    df_sub = df[df.date.isin(required_interval)]
    return df_sub['MoM'].product(min_count=1)

df['base_effect'] = df.apply(base_effect, axis=1)
df['new_price_effect'] = df.apply(new_price_effect, axis=1)
print(df)
  • Related