Home > front end >  Calculate percentage change between values of column in Pandas dataframe
Calculate percentage change between values of column in Pandas dataframe

Time:05-31

I have a dataframe with some price indices across 5 years, from 2017 to 2021. It looks like this:

Country Industry Year Index
US Agriculture 2017 83
US Agriculture 2018 97.2
US Agriculture 2019 100
US Agriculture 2020 112
US Agriculture 2021 108
Japan Mining 2017 88
Japan Mining 2018 93
Japan Mining 2019 100
Japan Mining 2020 104
Japan Mining 2021 112

My base year is 2019, hence the Index for every row tagged with 2019 is 100. Everything else moves up or down. I want to generate another column called Percentage_Change showing the year on year change starting from 2019 as the base year.

I tried using the pd.series.pct_change function, however, that calculates the year on year percentage change starting with 2017 and it generates an NaN value for all rows where the year is 2017, instead of 2019 which should be the base year.

I want the output to look like this:

Country Industry Year Index Percentage_change
Japan Mining 2017 88 -5.37%
Japan Mining 2018 93 -7%
Japan Mining 2019 100 0
Japan Mining 2020 104 4%
Japan Mining 2021 112 7.69%

The percentage_change for Japan between 2021 and 2020 is (112-104)/104 = 7.69%, the difference between 2020 and 2019 is (104-100)/100 = 4%, the difference between 2018 and 2019 is (93-100)/100 = -7%, the difference between 2017 and 2018 is (88-93)/93 = -5.37%

Is there any other way of calculating % change in pandas?

CodePudding user response:

pct_change is computing a change relative to the previous value (which is why 2017 is NaN), and this doesn't seem to be what you want. If you want to compute a percentage change relative to 2019, as 2019 is already normalized to 100, simply subtract 100:

df['Percentage_Change'] = df['Index'].sub(100)

output:

  Country     Industry  Year  Index  Percentage_Change
0      US  Agriculture  2017   83.0              -17.0
1      US  Agriculture  2018   97.2               -2.8
2      US  Agriculture  2019  100.0                0.0
3      US  Agriculture  2020  112.0               12.0
4      US  Agriculture  2021  108.0                8.0
5   Japan       Mining  2017   88.0              -12.0
6   Japan       Mining  2018   93.0               -7.0
7   Japan       Mining  2019  100.0                0.0
8   Japan       Mining  2020  104.0                4.0
9   Japan       Mining  2021  112.0               12.0

bidirectional pct_change

If you want a bidirectional pct_change "centered" on 100, you can use masks to compute the pct_change both ways:

df['Percentage_Change'] = (df
 .assign(ref=df['Year'].eq(2019))
 .groupby(['Country', 'Industry'], group_keys=False)
 .apply(lambda g: g['Index'].where(g['ref'].cummax()).pct_change()
                  .fillna(g['Index'][::-1].pct_change().mask(g['ref'].cummax(), 0))
       )
)

output:

  Country     Industry  Year  Index  Percentage_Change
0      US  Agriculture  2017   83.0          -0.146091
1      US  Agriculture  2018   97.2          -0.028000
2      US  Agriculture  2019  100.0           0.000000
3      US  Agriculture  2020  112.0           0.120000
4      US  Agriculture  2021  108.0          -0.035714
5   Japan       Mining  2017   88.0          -0.053763
6   Japan       Mining  2018   93.0          -0.070000
7   Japan       Mining  2019  100.0           0.000000
8   Japan       Mining  2020  104.0           0.040000
9   Japan       Mining  2021  112.0           0.076923
  • Related