Let's say we have a sample dataframe df
which is CPI data having date
and MoM
columns, I need to calculate base_effect
and new_price_effect
based on them:
date MoM base_effect new_price_effect
0 2019-01 1.010 NaN NaN
1 2019-02 1.010 NaN NaN
2 2019-03 1.010 NaN NaN
3 2019-04 1.010 NaN NaN
4 2019-05 1.010 NaN NaN
5 2019-06 1.010 NaN NaN
6 2019-07 1.010 NaN NaN
7 2019-08 1.010 NaN NaN
8 2019-09 1.010 NaN NaN
9 2019-10 1.010 NaN NaN
10 2019-11 1.010 NaN NaN
11 2019-12 1.010 NaN NaN
12 2020-01 1.015 1.115668 1.015000
13 2020-02 1.015 1.104622 1.030225
14 2020-03 1.015 1.093685 1.045678
15 2020-04 1.015 1.082857 1.061364
16 2020-05 1.015 1.072135 1.077284
17 2020-06 1.015 1.061520 1.093443
18 2020-07 1.015 1.051010 1.109845
19 2020-08 1.015 1.040604 1.126493
20 2020-09 1.015 1.030301 1.143390
21 2020-10 1.015 1.020100 1.160541
22 2020-11 1.015 1.010000 1.177949
23 2020-12 1.015 1.000000 1.195618
Or:
df = pd.DataFrame([{'date': '2019-01', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-02', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-03', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-04', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-05', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-06', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-07', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-08', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-09', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-10', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-11', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2019-12', 'MoM': 1.01, 'base_effect': np.NaN, 'new_price_effect': np.NaN},
{'date': '2020-01',
'MoM': 1.015,
'base_effect': 1.115668347,
'new_price_effect': 1.015},
{'date': '2020-02',
'MoM': 1.015,
'base_effect': 1.104622125,
'new_price_effect': 1.030225},
{'date': '2020-03',
'MoM': 1.015,
'base_effect': 1.093685273,
'new_price_effect': 1.045678375},
{'date': '2020-04',
'MoM': 1.015,
'base_effect': 1.082856706,
'new_price_effect': 1.061363551},
{'date': '2020-05',
'MoM': 1.015,
'base_effect': 1.072135352,
'new_price_effect': 1.077284004},
{'date': '2020-06',
'MoM': 1.015,
'base_effect': 1.061520151,
'new_price_effect': 1.093443264},
{'date': '2020-07',
'MoM': 1.015,
'base_effect': 1.05101005,
'new_price_effect': 1.109844913},
{'date': '2020-08',
'MoM': 1.015,
'base_effect': 1.04060401,
'new_price_effect': 1.126492587},
{'date': '2020-09',
'MoM': 1.015,
'base_effect': 1.030301,
'new_price_effect': 1.143389975},
{'date': '2020-10',
'MoM': 1.015,
'base_effect': 1.0201,
'new_price_effect': 1.160540825},
{'date': '2020-11',
'MoM': 1.015,
'base_effect': 1.01,
'new_price_effect': 1.177948937},
{'date': '2020-12',
'MoM': 1.015,
'base_effect': 1.0,
'new_price_effect': 1.195618171}])
The calculation of base_effect
is based on the following formula:
ie: to calculate base effect of 2020-06
, we use MoM(2019-07) * MoM(2019-08) * MoM(2019-09) * MoM(2019-10) * MoM(2019-11) * MoM(2019-12) = 1.061520
, for 2020-09
, MoM(2019-10) * MoM(2019-11) * MoM(2019-12) = 1.030301
The new_price_effect
is based on:
ie: to calculate new price effect of 2020-04
, we use MoM(2020-01) * MoM(2020-02) * MoM(2020-03) * MoM(2020-04) = 1.061364
; for 2020-06
, MoM(2020-01) * MoM(2020-02) * MoM(2020-03) * MoM(2020-04) * MoM(2020-05) * MoM(2020-06) = 1.093443
, etc.
We can see for base_effect
, the MoM
of December
of previous year is fixed, while for new_price_effect
, the MoM
of January
of current year is fixed. And the base effect of for each December
is always 1.00
.
How could I calculate them with Pandas? Thanks.
CodePudding user response:
Try:
from dateutil.relativedelta import relativedelta
import pandas as pd
from numpy import nan
from datetime import datetime
data = [{'date': '2019-01', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-02', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-03', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-04', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-05', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-06', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-07', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-08', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-09', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-10', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-11', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2019-12', 'MoM': 1.01, 'base_effect': nan, 'new_price_effect': nan},
{'date': '2020-01',
'MoM': 1.015,
'base_effect': 1.115668347,
'new_price_effect': 1.015},
{'date': '2020-02',
'MoM': 1.015,
'base_effect': 1.104622125,
'new_price_effect': 1.030225},
{'date': '2020-03',
'MoM': 1.015,
'base_effect': 1.093685273,
'new_price_effect': 1.045678375},
{'date': '2020-04',
'MoM': 1.015,
'base_effect': 1.082856706,
'new_price_effect': 1.061363551},
{'date': '2020-05',
'MoM': 1.015,
'base_effect': 1.072135352,
'new_price_effect': 1.077284004},
{'date': '2020-06',
'MoM': 1.015,
'base_effect': 1.061520151,
'new_price_effect': 1.093443264},
{'date': '2020-07',
'MoM': 1.015,
'base_effect': 1.05101005,
'new_price_effect': 1.109844913},
{'date': '2020-08',
'MoM': 1.015,
'base_effect': 1.04060401,
'new_price_effect': 1.126492587},
{'date': '2020-09',
'MoM': 1.015,
'base_effect': 1.030301,
'new_price_effect': 1.143389975},
{'date': '2020-10',
'MoM': 1.015,
'base_effect': 1.0201,
'new_price_effect': 1.160540825},
{'date': '2020-11',
'MoM': 1.015,
'base_effect': 1.01,
'new_price_effect': 1.177948937},
{'date': '2020-12',
'MoM': 1.015,
'base_effect': 1.0,
'new_price_effect': 1.195618171}]
df = pd.DataFrame.from_dict(data)
df['date'] = pd.to_datetime(df['date'])
print(df)
def base_effect(x):
required_interval = pd.date_range(x['date'] - relativedelta(months=11), freq='MS',
periods=12 - x['date'].month)
df_sub = df[df.date.isin(required_interval)]
return df_sub['MoM'].product(min_count=1)
def new_price_effect(x):
required_interval = pd.date_range(datetime(x['date'].year, 1, 1), freq='MS', periods=x['date'].month)
df_sub = df[df.date.isin(required_interval)]
return df_sub['MoM'].product(min_count=1)
df['base_effect'] = df.apply(base_effect, axis=1)
df['new_price_effect'] = df.apply(new_price_effect, axis=1)
print(df)