I have a dataframe that looks similar to the following:
ColA ColB Year ...
=====================
1 2 2007
2 5 2007
3 4 2007
4 3 2007
5 2 2008
6 1 2008
7 0 2008
8 9 2008
...
I am using dat[['ColA', 'ColB']].describe()
. When I do this, as expected, it displays summary statistics for both columns over all years. I would like to have summary statistics for each column by year. In the example above, I would have 4 columns of statistics (1 for ColA
in 2007, 1 for ColA
in 2008, 1 for ColB
in 2007, and 1 for ColB
in 2008). Is there a way to extend the capabilities of pd.describe()
to accommodate this?
CodePudding user response:
you can group by year before calling describe
:
df_example = pd.DataFrame({"colA": [1, 2, 3, 4, 5, 6, 7, 8],
"Year": [2007, 2007, 2007, 2007, 2008, 2008, 2008, 2008]})
des = df_example.groupby("Year").describe()
print(des)
colA
count mean std min 25% 50% 75% max
Year
2007 4.0 2.5 1.290994 1.0 1.75 2.5 3.25 4.0
2008 4.0 6.5 1.290994 5.0 5.75 6.5 7.25 8.0