Home > front end >  Pandas: Create a new column with the difference between every fifth cell in another column
Pandas: Create a new column with the difference between every fifth cell in another column

Time:07-15

I have a set of time series for different groups of financial institutions that runs from 2019-03 to 2021-12. The columns in the dataframes are quarter and capital buffer rates, as seen below.

Quarter Capital buffer rate (%)
2019-03 38
2019-06 43
2019-09 38
2019-12 37
2020-03 37
2020-06 37
2020-09 37
2020-12 35
2021-03 37
2021-06 41
2021-09 43
2021-12 44

I have five sets of these (for investment firms, banks, etc). I would like to add new columns to each dataframe, which have the percentage point change from year to year (e.g. 4 percentage points from 2020-06 to 2021-06). This means taking the difference between every fifth cell. For the quarters that don't have preceding yearly values (2019-03, 2019-06, 2019-09, and 2019-12), I would like to add a null value.

I made it work using a fairly clunky for-loop, subtracting the value four cells above, adding the values to a list, and adding the list to the dataframe, like this:


for i in range(0, len(df)):
    if i <= 3:
        diff = np.nan
    else:  
        diff = df['Capital buffer rate'][i] - df['Capital buffer rate'][i-4]
    
    list.append(diff)

df['Percentage point change'] = list

But this doesn't feel especially robust. And since I'm doing this for five different dataframes, I guess I have to create five variables in each loop and five lists? Is there maybe a simpler and better way of doing this?

CodePudding user response:

I think you want this:

# set up problem
idx = pd.DatetimeIndex(pd.date_range('2019-03', '2022-01', freq='Q'))
df = pd.DataFrame({'Capital buffer rate (%)': [38, 43, 38, 37, 37, 37, 37, 35, 37, 41, 43, 44]}, index=idx)

# solution:
df['Percentage point change'] = df['Capital buffer rate (%)'] - df['Capital buffer rate (%)'].shift(4)

Output:

 Capital buffer rate (%) Percentage point change
2019-03-31  38  NaN
2019-06-30  43  NaN
2019-09-30  38  NaN
2019-12-31  37  NaN
2020-03-31  37  -1.0
2020-06-30  37  -6.0
2020-09-30  37  -1.0
2020-12-31  35  -2.0
2021-03-31  37  0.0
2021-06-30  41  4.0
2021-09-30  43  6.0
2021-12-31  44  9.0
  • Related