I am trying to calculate year or year growth for a variable in a Pandas dataframe. My data looks like this:
Year | Country | Industry | Value |
---|---|---|---|
2000 | USA | Manufacturing | 5 |
2000 | Mexico | Manufacturing | 10 |
2001 | Mexico | Manufacturing | 15 |
2002 | Mexico | Other | 20 |
I have different number of observations depending on the Country or Industry. Expected output:
Year | Country | Industry | Value | YOY |
---|---|---|---|---|
2000 | USA | Manufacturing | 5 | NaN |
2000 | Mexico | Manufacturing | 10 | NaN |
2001 | Mexico | Manufacturing | 15 | 50% |
2002 | Mexico | Other | 20 | NaN |
I tried different things including:
df.groupby(['Country','Industry','Year'])['Value'].pct_change()
df['YOY'] = (df['Value'] - df.sort_values(by=['Country','Industry','Year']).groupby(['Country','Industry'])['Value'].shift(1))) / df['Value']
The first line calculates growth between rows without resetting for a new Country or Industry. The second one has incoherent results.
Any lead I could take? Thanks!!
CodePudding user response:
Try this:
df['YOY'] = df.groupby(['Country','Industry'])['Value'].pct_change().mul(100)
Output:
>>> df
Year Country Industry Value YOY
0 2000 USA Manufacturing 5 NaN
1 2000 Mexico Manufacturing 10 NaN
2 2001 Mexico Manufacturing 15 50.0
3 2002 Mexico Other 20 NaN