Extract values from dataset to perform functions- multiple countries within dataset-CodePudding

My dataset looks as follows:

Country	Year	Value
Ireland	2010	9
Ireland	2011	11
Ireland	2012	14
Ireland	2013	17
Ireland	2014	20
France	2011	15
France	2012	19
France	2013	21
France	2014	28
Germany	2008	17
Germany	2009	20
Germany	2010	19
Germany	2011	24
Germany	2012	27
Germany	2013	32

My goal is to create a new dataset which tells me the % increase from the first year of available data for a given country, compared to the most recent, which would look roughly as follows:

Country	% increase
Ireland	122
France	87
Germany	88

In essence, I need my code for each country in my dataset, to locate the smallest and largest value for year, then take the corresponding values within the value column and calculate the % increase.

I can do this manually, however I have a lot of countries in my dataset and am looking for a more elegant way to do it. I am trying to troubleshoot my code for this however I am not having much luck as of yet.

My code looks as follows at present:

df_1["Min_value"] = df.loc[df["Year"].min(),"Value"].iloc[0]
df_1["Max_value"] = df.loc[df["Year"].max(),"Value"].iloc[0]

df_1["% increase"] = ((df_1["Max_value"]-df_1["Min_value"])/df_1["Min_value"])*100

This returns an error:

AttributeError: 'numpy.float64' object has no attribute 'iloc'

In addition to this it also has the issue that I cannot figure out a way to have the code to run individually for each country within my dataset, so this is another challenge which I am not entirely sure how to address.

Could I potentially go down the route of defining a particular function which could then be applied to each country?

CodePudding user response：

You can group by Country and aggregate min/max for both Year and Value, then calculate percentage change between min and max of the Value.

pct_df = df.groupby(['Country']).agg(['min', 'max'])['Value']\
    .apply(lambda x: x.pct_change().round(2) * 100, axis=1)\
    .drop('min', axis=1).rename(columns={'max':'% increase'}).reset_index()

print(pct_df)

The output:

    Country  % increase
0   France         87.0
1  Germany         88.0
2  Ireland        122.0