Home > front end >  Calculate multiple columns' current values based on yoy change and value of previous year in Pa
Calculate multiple columns' current values based on yoy change and value of previous year in Pa

Time:11-05

Given a df as follows:

df = [{'date': '1980-01-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 190.3},
 {'date': '1980-02-29 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 174.9},
 {'date': '1980-03-31 00:00:00',
  'yoy_1': 0.001470155,
  'yoy_2': 0.002116268,
  'value': 163.2},
 {'date': '1980-04-30 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 168.4},
 {'date': '1980-05-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 168.6},
 {'date': '1980-06-30 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 168.2},
 {'date': '1980-07-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 163.5},
 {'date': '1980-08-31 00:00:00',
  'yoy_1': 0.019327965,
  'yoy_2': 0.002116268,
  'value': 161.6},
 {'date': '1980-09-30 00:00:00',
  'yoy_1': 0.001203869,
  'yoy_2': 0.002116268,
  'value': 172.9},
 {'date': '1980-10-31 00:00:00',
  'yoy_1': 0.101000481,
  'yoy_2': 0.222560596,
  'value': 166.5},
 {'date': '1980-11-30 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 175.2},
 {'date': '1980-12-31 00:00:00',
  'yoy_1': 0.102218761,
  'yoy_2': 0.002116268,
  'value': 197.7},
 {'date': '1981-01-31 00:00:00',
  'yoy_1': 0.001521527,
  'yoy_2': 0.002116268,
  'value': 212.1},
 {'date': '1981-02-28 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.234524059,
  'value': 177.9},
 {'date': '1981-03-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 182.9},
 {'date': '1981-04-30 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 184.2},
 {'date': '1981-05-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 184.0},
 {'date': '1981-06-30 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 182.4},
 {'date': '1981-07-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 175.6},
 {'date': '1981-08-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 172.0},
 {'date': '1981-09-30 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 184.9},
 {'date': '1981-10-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 184.7},
 {'date': '1981-11-30 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 195.1},
 {'date': '1981-12-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.086862869,
  'value': 224.8},
 {'date': '1982-01-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.226102276,
  'value': 233.6},
 {'date': '1982-02-28 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 182.0},
 {'date': '1982-03-31 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 206.6},
 {'date': '1982-04-30 00:00:00',
  'yoy_1': 0.001521525,
  'yoy_2': 0.002116268,
  'value': 202.2}]

Out:

                   date     yoy_1     yoy_2  value
0   1980-01-31 00:00:00  0.001522  0.002116  190.3
1   1980-02-29 00:00:00  0.001522  0.002116  174.9
2   1980-03-31 00:00:00  0.001470  0.002116  163.2
3   1980-04-30 00:00:00  0.001522  0.002116  168.4
4   1980-05-31 00:00:00  0.001522  0.002116  168.6
5   1980-06-30 00:00:00  0.001522  0.002116  168.2
6   1980-07-31 00:00:00  0.001522  0.002116  163.5
7   1980-08-31 00:00:00  0.019328  0.002116  161.6
8   1980-09-30 00:00:00  0.001204  0.002116  172.9
9   1980-10-31 00:00:00  0.101000  0.222561  166.5
10  1980-11-30 00:00:00  0.001522  0.002116  175.2
11  1980-12-31 00:00:00  0.102219  0.002116  197.7
12  1981-01-31 00:00:00  0.001522  0.002116  212.1
13  1981-02-28 00:00:00  0.001522  0.234524  177.9
14  1981-03-31 00:00:00  0.001522  0.002116  182.9
15  1981-04-30 00:00:00  0.001522  0.002116  184.2
16  1981-05-31 00:00:00  0.001522  0.002116  184.0
17  1981-06-30 00:00:00  0.001522  0.002116  182.4
18  1981-07-31 00:00:00  0.001522  0.002116  175.6
19  1981-08-31 00:00:00  0.001522  0.002116  172.0
20  1981-09-30 00:00:00  0.001522  0.002116  184.9
21  1981-10-31 00:00:00  0.001522  0.002116  184.7
22  1981-11-30 00:00:00  0.001522  0.002116  195.1
23  1981-12-31 00:00:00  0.001522  0.086863  224.8
24  1982-01-31 00:00:00  0.001522  0.226102  233.6
25  1982-02-28 00:00:00  0.001522  0.002116  182.0
26  1982-03-31 00:00:00  0.001522  0.002116  206.6
27  1982-04-30 00:00:00  0.001522  0.002116  202.2

I hope to calculate real values based on columns starts with yoy and value, with code below I'able to do that but it's not concise.

ie., for 1981-01-31 00:00:00, the yoy_1_value will be calculated by (1 yoy_1) * value from 1980-01-31 00:00:00; the yoy_2_value will be calculated by (1 yoy_2) * value from 1980-01-31 00:00:00.

df['yoy_1_value'] = (1   df['yoy_1']).mul(df['value'].shift(12))
df['yoy_2_value'] = (1   df['yoy_2']).mul(df['value'].shift(12))

How could I improve it apply it multiple columns names startswith yoy_? Thanks.

def cal_current_values(x):
    return (1   x).mul(df['value'].shift(12))

To filter yoy columns:

yoy_cols = [col for col in df if col.startswith('yoy')]
yoy_cols

The expected result:

                   date     yoy_1     yoy_2  value  yoy_1_value  yoy_2_value
0   1980-01-31 00:00:00  0.001522  0.002116  190.3          NaN          NaN
1   1980-02-29 00:00:00  0.001522  0.002116  174.9          NaN          NaN
2   1980-03-31 00:00:00  0.001470  0.002116  163.2          NaN          NaN
3   1980-04-30 00:00:00  0.001522  0.002116  168.4          NaN          NaN
4   1980-05-31 00:00:00  0.001522  0.002116  168.6          NaN          NaN
5   1980-06-30 00:00:00  0.001522  0.002116  168.2          NaN          NaN
6   1980-07-31 00:00:00  0.001522  0.002116  163.5          NaN          NaN
7   1980-08-31 00:00:00  0.019328  0.002116  161.6          NaN          NaN
8   1980-09-30 00:00:00  0.001204  0.002116  172.9          NaN          NaN
9   1980-10-31 00:00:00  0.101000  0.222561  166.5          NaN          NaN
10  1980-11-30 00:00:00  0.001522  0.002116  175.2          NaN          NaN
11  1980-12-31 00:00:00  0.102219  0.002116  197.7          NaN          NaN
12  1981-01-31 00:00:00  0.001522  0.002116  212.1   190.589547   190.702726
13  1981-02-28 00:00:00  0.001522  0.234524  177.9   175.166115   215.918258
14  1981-03-31 00:00:00  0.001522  0.002116  182.9   163.448313   163.545375
15  1981-04-30 00:00:00  0.001522  0.002116  184.2   168.656225   168.756380
16  1981-05-31 00:00:00  0.001522  0.002116  184.0   168.856529   168.956803
17  1981-06-30 00:00:00  0.001522  0.002116  182.4   168.455921   168.555956
18  1981-07-31 00:00:00  0.001522  0.002116  175.6   163.748769   163.846010
19  1981-08-31 00:00:00  0.001522  0.002116  172.0   161.845878   161.941989
20  1981-09-30 00:00:00  0.001522  0.002116  184.9   173.163072   173.265903
21  1981-10-31 00:00:00  0.001522  0.002116  184.7   166.753334   166.852359
22  1981-11-30 00:00:00  0.001522  0.002116  195.1   175.466571   175.570770
23  1981-12-31 00:00:00  0.001522  0.086863  224.8   198.000805   214.872789
24  1982-01-31 00:00:00  0.001522  0.226102  233.6   212.422715   260.056293
25  1982-02-28 00:00:00  0.001522  0.002116  182.0   178.170679   178.276484
26  1982-03-31 00:00:00  0.001522  0.002116  206.6   183.178287   183.287065
27  1982-04-30 00:00:00  0.001522  0.002116  202.2   184.480265   184.589817

CodePudding user response:

Use DataFrame.filter with ^ for start of strings, there is added axis=0 for correct multiple by column value:

df = df.filter(regex='^yoy_').add(1).mul(df['value'].shift(12), axis=0)

print (df)
         yoy_1       yoy_2
0          NaN         NaN
1          NaN         NaN
2          NaN         NaN
3          NaN         NaN
4          NaN         NaN
5          NaN         NaN
6          NaN         NaN
7          NaN         NaN
8          NaN         NaN
9          NaN         NaN
10         NaN         NaN
11         NaN         NaN
12  190.589547  190.702726
13  175.166115  215.918258
14  163.448313  163.545375
15  168.656225  168.756380
16  168.856529  168.956803
17  168.455921  168.555956
18  163.748769  163.846010
19  161.845878  161.941989
20  173.163072  173.265903
21  166.753334  166.852359
22  175.466571  175.570770
23  198.000805  214.872789
24  212.422715  260.056293
25  178.170679  178.276484
26  183.178287  183.287065
27  184.480265  184.589817

Add to original:

df = df.join(df.filter(regex='^yoy_').add(1).mul(df['value'].shift(12), axis=0).add_suffix('_values'))
  • Related