I have the following dataframe:
d2 = {'year': ['1996', '1997', '1998', '1996', '1997', '1998'], 'country': ['USA', 'USA', 'USA',\
'ITA', 'ITA', 'ITA'],\
'value': [0, 1, 3, 2.75, 1, 8]}
df2 = pd.DataFrame(data=d2)
df2
year country value
0 1996 USA 0.00
1 1997 USA 1.00
2 1998 USA 3.00
3 1996 ITA 2.75
4 1997 ITA 1.00
5 1998 ITA 8.00
I wish to calculate the mean for 1996 and 1997 for each group, and store it in a new column, like so:
year country value 2_prior_year_mean
0 1996 USA 0.00 NaN
1 1997 USA 1.00 NaN
2 1998 USA 3.00 0.5
3 1996 ITA 2.75 NaN
4 1997 ITA 1.00 NaN
5 1998 ITA 8.00 1.875
Any idea how to do this? Thanks!
CodePudding user response:
I believe you can filter rows in dataframes like so:
year_1997 = d2[d2["year"] == 1997]
This outputs a pandas Series of bools, therefore you can filter the dataframe with it using []
. (More about this here)
Then you can use the pandas.Dataframe.mean
to get the mean value.
To add this as a new column you might be able to assign
the new column, as explained in this answer.
CodePudding user response:
turns out it is easy as:
df2.loc[df2["year"]<"1998"].groupby('country')[['value']].mean()