Given a df
as follows:
df = [{'date': '1980-01-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 190.3},
{'date': '1980-02-29 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 174.9},
{'date': '1980-03-31 00:00:00',
'yoy_1': 0.001470155,
'yoy_2': 0.002116268,
'value': 163.2},
{'date': '1980-04-30 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 168.4},
{'date': '1980-05-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 168.6},
{'date': '1980-06-30 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 168.2},
{'date': '1980-07-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 163.5},
{'date': '1980-08-31 00:00:00',
'yoy_1': 0.019327965,
'yoy_2': 0.002116268,
'value': 161.6},
{'date': '1980-09-30 00:00:00',
'yoy_1': 0.001203869,
'yoy_2': 0.002116268,
'value': 172.9},
{'date': '1980-10-31 00:00:00',
'yoy_1': 0.101000481,
'yoy_2': 0.222560596,
'value': 166.5},
{'date': '1980-11-30 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 175.2},
{'date': '1980-12-31 00:00:00',
'yoy_1': 0.102218761,
'yoy_2': 0.002116268,
'value': 197.7},
{'date': '1981-01-31 00:00:00',
'yoy_1': 0.001521527,
'yoy_2': 0.002116268,
'value': 212.1},
{'date': '1981-02-28 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.234524059,
'value': 177.9},
{'date': '1981-03-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 182.9},
{'date': '1981-04-30 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 184.2},
{'date': '1981-05-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 184.0},
{'date': '1981-06-30 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 182.4},
{'date': '1981-07-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 175.6},
{'date': '1981-08-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 172.0},
{'date': '1981-09-30 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 184.9},
{'date': '1981-10-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 184.7},
{'date': '1981-11-30 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 195.1},
{'date': '1981-12-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.086862869,
'value': 224.8},
{'date': '1982-01-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.226102276,
'value': 233.6},
{'date': '1982-02-28 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 182.0},
{'date': '1982-03-31 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 206.6},
{'date': '1982-04-30 00:00:00',
'yoy_1': 0.001521525,
'yoy_2': 0.002116268,
'value': 202.2}]
Out:
date yoy_1 yoy_2 value
0 1980-01-31 00:00:00 0.001522 0.002116 190.3
1 1980-02-29 00:00:00 0.001522 0.002116 174.9
2 1980-03-31 00:00:00 0.001470 0.002116 163.2
3 1980-04-30 00:00:00 0.001522 0.002116 168.4
4 1980-05-31 00:00:00 0.001522 0.002116 168.6
5 1980-06-30 00:00:00 0.001522 0.002116 168.2
6 1980-07-31 00:00:00 0.001522 0.002116 163.5
7 1980-08-31 00:00:00 0.019328 0.002116 161.6
8 1980-09-30 00:00:00 0.001204 0.002116 172.9
9 1980-10-31 00:00:00 0.101000 0.222561 166.5
10 1980-11-30 00:00:00 0.001522 0.002116 175.2
11 1980-12-31 00:00:00 0.102219 0.002116 197.7
12 1981-01-31 00:00:00 0.001522 0.002116 212.1
13 1981-02-28 00:00:00 0.001522 0.234524 177.9
14 1981-03-31 00:00:00 0.001522 0.002116 182.9
15 1981-04-30 00:00:00 0.001522 0.002116 184.2
16 1981-05-31 00:00:00 0.001522 0.002116 184.0
17 1981-06-30 00:00:00 0.001522 0.002116 182.4
18 1981-07-31 00:00:00 0.001522 0.002116 175.6
19 1981-08-31 00:00:00 0.001522 0.002116 172.0
20 1981-09-30 00:00:00 0.001522 0.002116 184.9
21 1981-10-31 00:00:00 0.001522 0.002116 184.7
22 1981-11-30 00:00:00 0.001522 0.002116 195.1
23 1981-12-31 00:00:00 0.001522 0.086863 224.8
24 1982-01-31 00:00:00 0.001522 0.226102 233.6
25 1982-02-28 00:00:00 0.001522 0.002116 182.0
26 1982-03-31 00:00:00 0.001522 0.002116 206.6
27 1982-04-30 00:00:00 0.001522 0.002116 202.2
I hope to calculate real values based on columns starts with yoy
and value
, with code below I'able to do that but it's not concise.
ie., for 1981-01-31 00:00:00
, the yoy_1_value
will be calculated by (1 yoy_1) * value from 1980-01-31 00:00:00
; the yoy_2_value
will be calculated by (1 yoy_2) * value from 1980-01-31 00:00:00
.
df['yoy_1_value'] = (1 df['yoy_1']).mul(df['value'].shift(12))
df['yoy_2_value'] = (1 df['yoy_2']).mul(df['value'].shift(12))
How could I improve it apply it multiple columns names startswith yoy_
? Thanks.
def cal_current_values(x):
return (1 x).mul(df['value'].shift(12))
To filter yoy
columns:
yoy_cols = [col for col in df if col.startswith('yoy')]
yoy_cols
The expected result:
date yoy_1 yoy_2 value yoy_1_value yoy_2_value
0 1980-01-31 00:00:00 0.001522 0.002116 190.3 NaN NaN
1 1980-02-29 00:00:00 0.001522 0.002116 174.9 NaN NaN
2 1980-03-31 00:00:00 0.001470 0.002116 163.2 NaN NaN
3 1980-04-30 00:00:00 0.001522 0.002116 168.4 NaN NaN
4 1980-05-31 00:00:00 0.001522 0.002116 168.6 NaN NaN
5 1980-06-30 00:00:00 0.001522 0.002116 168.2 NaN NaN
6 1980-07-31 00:00:00 0.001522 0.002116 163.5 NaN NaN
7 1980-08-31 00:00:00 0.019328 0.002116 161.6 NaN NaN
8 1980-09-30 00:00:00 0.001204 0.002116 172.9 NaN NaN
9 1980-10-31 00:00:00 0.101000 0.222561 166.5 NaN NaN
10 1980-11-30 00:00:00 0.001522 0.002116 175.2 NaN NaN
11 1980-12-31 00:00:00 0.102219 0.002116 197.7 NaN NaN
12 1981-01-31 00:00:00 0.001522 0.002116 212.1 190.589547 190.702726
13 1981-02-28 00:00:00 0.001522 0.234524 177.9 175.166115 215.918258
14 1981-03-31 00:00:00 0.001522 0.002116 182.9 163.448313 163.545375
15 1981-04-30 00:00:00 0.001522 0.002116 184.2 168.656225 168.756380
16 1981-05-31 00:00:00 0.001522 0.002116 184.0 168.856529 168.956803
17 1981-06-30 00:00:00 0.001522 0.002116 182.4 168.455921 168.555956
18 1981-07-31 00:00:00 0.001522 0.002116 175.6 163.748769 163.846010
19 1981-08-31 00:00:00 0.001522 0.002116 172.0 161.845878 161.941989
20 1981-09-30 00:00:00 0.001522 0.002116 184.9 173.163072 173.265903
21 1981-10-31 00:00:00 0.001522 0.002116 184.7 166.753334 166.852359
22 1981-11-30 00:00:00 0.001522 0.002116 195.1 175.466571 175.570770
23 1981-12-31 00:00:00 0.001522 0.086863 224.8 198.000805 214.872789
24 1982-01-31 00:00:00 0.001522 0.226102 233.6 212.422715 260.056293
25 1982-02-28 00:00:00 0.001522 0.002116 182.0 178.170679 178.276484
26 1982-03-31 00:00:00 0.001522 0.002116 206.6 183.178287 183.287065
27 1982-04-30 00:00:00 0.001522 0.002116 202.2 184.480265 184.589817
CodePudding user response:
Use DataFrame.filter
with ^
for start of strings, there is added axis=0
for correct multiple by column value
:
df = df.filter(regex='^yoy_').add(1).mul(df['value'].shift(12), axis=0)
print (df)
yoy_1 yoy_2
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 NaN NaN
11 NaN NaN
12 190.589547 190.702726
13 175.166115 215.918258
14 163.448313 163.545375
15 168.656225 168.756380
16 168.856529 168.956803
17 168.455921 168.555956
18 163.748769 163.846010
19 161.845878 161.941989
20 173.163072 173.265903
21 166.753334 166.852359
22 175.466571 175.570770
23 198.000805 214.872789
24 212.422715 260.056293
25 178.170679 178.276484
26 183.178287 183.287065
27 184.480265 184.589817
Add to original:
df = df.join(df.filter(regex='^yoy_').add(1).mul(df['value'].shift(12), axis=0).add_suffix('_values'))