Home > Back-end >  YOY growth based on ID
YOY growth based on ID

Time:02-10

I am trying to calculate year or year growth for a variable in a Pandas dataframe. My data looks like this:

Year Country Industry Value
2000 USA Manufacturing 5
2000 Mexico Manufacturing 10
2001 Mexico Manufacturing 15
2002 Mexico Other 20

I have different number of observations depending on the Country or Industry. Expected output:

Year Country Industry Value YOY
2000 USA Manufacturing 5 NaN
2000 Mexico Manufacturing 10 NaN
2001 Mexico Manufacturing 15 50%
2002 Mexico Other 20 NaN

I tried different things including:

df.groupby(['Country','Industry','Year'])['Value'].pct_change()

df['YOY'] = (df['Value'] - df.sort_values(by=['Country','Industry','Year']).groupby(['Country','Industry'])['Value'].shift(1))) / df['Value']

The first line calculates growth between rows without resetting for a new Country or Industry. The second one has incoherent results.

Any lead I could take? Thanks!!

CodePudding user response:

Try this:

df['YOY'] = df.groupby(['Country','Industry'])['Value'].pct_change().mul(100)

Output:

>>> df
   Year Country       Industry  Value   YOY
0  2000     USA  Manufacturing      5   NaN
1  2000  Mexico  Manufacturing     10   NaN
2  2001  Mexico  Manufacturing     15  50.0
3  2002  Mexico          Other     20   NaN
  • Related