My dataset looks as follows:
Country | Year | Value |
---|---|---|
Ireland | 2010 | 9 |
Ireland | 2011 | 11 |
Ireland | 2012 | 14 |
Ireland | 2013 | 17 |
Ireland | 2014 | 20 |
France | 2011 | 15 |
France | 2012 | 19 |
France | 2013 | 21 |
France | 2014 | 28 |
Germany | 2008 | 17 |
Germany | 2009 | 20 |
Germany | 2010 | 19 |
Germany | 2011 | 24 |
Germany | 2012 | 27 |
Germany | 2013 | 32 |
My goal is to create a new dataset which tells me the % increase from the first year of available data for a given country, compared to the most recent, which would look roughly as follows:
Country | % increase |
---|---|
Ireland | 122 |
France | 87 |
Germany | 88 |
In essence, I need my code for each country in my dataset, to locate the smallest and largest value for year, then take the corresponding values within the value column and calculate the % increase.
I can do this manually, however I have a lot of countries in my dataset and am looking for a more elegant way to do it. I am trying to troubleshoot my code for this however I am not having much luck as of yet.
My code looks as follows at present:
df_1["Min_value"] = df.loc[df["Year"].min(),"Value"].iloc[0]
df_1["Max_value"] = df.loc[df["Year"].max(),"Value"].iloc[0]
df_1["% increase"] = ((df_1["Max_value"]-df_1["Min_value"])/df_1["Min_value"])*100
This returns an error:
AttributeError: 'numpy.float64' object has no attribute 'iloc'
In addition to this it also has the issue that I cannot figure out a way to have the code to run individually for each country within my dataset, so this is another challenge which I am not entirely sure how to address.
Could I potentially go down the route of defining a particular function which could then be applied to each country?
CodePudding user response:
You can group by Country
and aggregate min/max
for both Year
and Value
, then calculate percentage change between min
and max
of the Value
.
pct_df = df.groupby(['Country']).agg(['min', 'max'])['Value']\
.apply(lambda x: x.pct_change().round(2) * 100, axis=1)\
.drop('min', axis=1).rename(columns={'max':'% increase'}).reset_index()
print(pct_df)
The output:
Country % increase
0 France 87.0
1 Germany 88.0
2 Ireland 122.0