I have a set of time series for different groups of financial institutions that runs from 2019-03 to 2021-12. The columns in the dataframes are quarter and capital buffer rates, as seen below.
Quarter | Capital buffer rate (%) |
---|---|
2019-03 | 38 |
2019-06 | 43 |
2019-09 | 38 |
2019-12 | 37 |
2020-03 | 37 |
2020-06 | 37 |
2020-09 | 37 |
2020-12 | 35 |
2021-03 | 37 |
2021-06 | 41 |
2021-09 | 43 |
2021-12 | 44 |
I have five sets of these (for investment firms, banks, etc). I would like to add new columns to each dataframe, which have the percentage point change from year to year (e.g. 4 percentage points from 2020-06 to 2021-06). This means taking the difference between every fifth cell. For the quarters that don't have preceding yearly values (2019-03, 2019-06, 2019-09, and 2019-12), I would like to add a null value.
I made it work using a fairly clunky for-loop, subtracting the value four cells above, adding the values to a list, and adding the list to the dataframe, like this:
for i in range(0, len(df)):
if i <= 3:
diff = np.nan
else:
diff = df['Capital buffer rate'][i] - df['Capital buffer rate'][i-4]
list.append(diff)
df['Percentage point change'] = list
But this doesn't feel especially robust. And since I'm doing this for five different dataframes, I guess I have to create five variables in each loop and five lists? Is there maybe a simpler and better way of doing this?
CodePudding user response:
I think you want this:
# set up problem
idx = pd.DatetimeIndex(pd.date_range('2019-03', '2022-01', freq='Q'))
df = pd.DataFrame({'Capital buffer rate (%)': [38, 43, 38, 37, 37, 37, 37, 35, 37, 41, 43, 44]}, index=idx)
# solution:
df['Percentage point change'] = df['Capital buffer rate (%)'] - df['Capital buffer rate (%)'].shift(4)
Output:
Capital buffer rate (%) Percentage point change
2019-03-31 38 NaN
2019-06-30 43 NaN
2019-09-30 38 NaN
2019-12-31 37 NaN
2020-03-31 37 -1.0
2020-06-30 37 -6.0
2020-09-30 37 -1.0
2020-12-31 35 -2.0
2021-03-31 37 0.0
2021-06-30 41 4.0
2021-09-30 43 6.0
2021-12-31 44 9.0