Calculate percentage change between values of column in Pandas dataframe-CodePudding

I have a dataframe with some price indices across 5 years, from 2017 to 2021. It looks like this:

Country	Industry	Year	Index
US	Agriculture	2017	83
US	Agriculture	2018	97.2
US	Agriculture	2019	100
US	Agriculture	2020	112
US	Agriculture	2021	108
Japan	Mining	2017	88
Japan	Mining	2018	93
Japan	Mining	2019	100
Japan	Mining	2020	104
Japan	Mining	2021	112

My base year is 2019, hence the Index for every row tagged with 2019 is 100. Everything else moves up or down. I want to generate another column called Percentage_Change showing the year on year change starting from 2019 as the base year.

I tried using the pd.series.pct_change function, however, that calculates the year on year percentage change starting with 2017 and it generates an NaN value for all rows where the year is 2017, instead of 2019 which should be the base year.

I want the output to look like this:

Country	Industry	Year	Index	Percentage_change
Japan	Mining	2017	88	-5.37%
Japan	Mining	2018	93	-7%
Japan	Mining	2019	100	0
Japan	Mining	2020	104	4%
Japan	Mining	2021	112	7.69%

The percentage_change for Japan between 2021 and 2020 is (112-104)/104 = 7.69%, the difference between 2020 and 2019 is (104-100)/100 = 4%, the difference between 2018 and 2019 is (93-100)/100 = -7%, the difference between 2017 and 2018 is (88-93)/93 = -5.37%

Is there any other way of calculating % change in pandas?

CodePudding user response：

pct_change is computing a change relative to the previous value (which is why 2017 is NaN), and this doesn't seem to be what you want. If you want to compute a percentage change relative to 2019, as 2019 is already normalized to 100, simply subtract 100:

df['Percentage_Change'] = df['Index'].sub(100)

output:

  Country     Industry  Year  Index  Percentage_Change
0      US  Agriculture  2017   83.0              -17.0
1      US  Agriculture  2018   97.2               -2.8
2      US  Agriculture  2019  100.0                0.0
3      US  Agriculture  2020  112.0               12.0
4      US  Agriculture  2021  108.0                8.0
5   Japan       Mining  2017   88.0              -12.0
6   Japan       Mining  2018   93.0               -7.0
7   Japan       Mining  2019  100.0                0.0
8   Japan       Mining  2020  104.0                4.0
9   Japan       Mining  2021  112.0               12.0

bidirectional `pct_change`

If you want a bidirectional pct_change "centered" on 100, you can use masks to compute the pct_change both ways:

df['Percentage_Change'] = (df
 .assign(ref=df['Year'].eq(2019))
 .groupby(['Country', 'Industry'], group_keys=False)
 .apply(lambda g: g['Index'].where(g['ref'].cummax()).pct_change()
                  .fillna(g['Index'][::-1].pct_change().mask(g['ref'].cummax(), 0))
       )
)

output:

  Country     Industry  Year  Index  Percentage_Change
0      US  Agriculture  2017   83.0          -0.146091
1      US  Agriculture  2018   97.2          -0.028000
2      US  Agriculture  2019  100.0           0.000000
3      US  Agriculture  2020  112.0           0.120000
4      US  Agriculture  2021  108.0          -0.035714
5   Japan       Mining  2017   88.0          -0.053763
6   Japan       Mining  2018   93.0          -0.070000
7   Japan       Mining  2019  100.0           0.000000
8   Japan       Mining  2020  104.0           0.040000
9   Japan       Mining  2021  112.0           0.076923

bidirectional pct_change

bidirectional `pct_change`