Calculate average for subgroup - pandas-CodePudding

I have the following dataframe:

d2 = {'year': ['1996', '1997', '1998', '1996',  '1997', '1998'], 'country': ['USA', 'USA', 'USA',\
                                                                            'ITA', 'ITA', 'ITA'],\
                                                                  'value': [0, 1, 3, 2.75, 1, 8]}
df2 = pd.DataFrame(data=d2)
df2

   year country value
0   1996    USA 0.00
1   1997    USA 1.00
2   1998    USA 3.00
3   1996    ITA 2.75
4   1997    ITA 1.00
5   1998    ITA 8.00

I wish to calculate the mean for 1996 and 1997 for each group, and store it in a new column, like so:

    year    country value   2_prior_year_mean
0   1996    USA     0.00    NaN
1   1997    USA     1.00    NaN
2   1998    USA     3.00    0.5
3   1996    ITA     2.75    NaN
4   1997    ITA     1.00    NaN
5   1998    ITA     8.00    1.875

Any idea how to do this? Thanks!

CodePudding user response：

I believe you can filter rows in dataframes like so:

year_1997 = d2[d2["year"] == 1997]

This outputs a pandas Series of bools, therefore you can filter the dataframe with it using []. (More about this here)

Then you can use the pandas.Dataframe.mean to get the mean value. To add this as a new column you might be able to assign the new column, as explained in this answer.

CodePudding user response：

turns out it is easy as:

df2.loc[df2["year"]<"1998"].groupby('country')[['value']].mean()