I have a transactional table that lists the gender of each person who started in one column, and the date they started in another. I want to perform a groupby that results in a data frame with a count of each gender by date. Any idea?
Gender LastStart
1 M 2013-05-21
2 M 2013-05-24
3 F 2013-05-27
4 M 2013-05-27
5 F 2013-05-28
6 F 2013-05-28
Should result in
M F
2013-05-21 1 0
2013-05-24 1 0
2013-05-27 1 1
2013-05-28 0 2
I think I need to run a groupby over a list and then a pivot but my groupby is producing a series so the pivot won't work. I'm very confused so any help much appreciated!
CodePudding user response:
Use pandas.crosstab
out = pd.crosstab(df['LastStart'], df['Gender'])
Output:
>>> out
Gender F M
LastStart
2013-05-21 0 1
2013-05-24 0 1
2013-05-27 1 1
2013-05-28 2 0
If you want to remove the axis labels, i.e. Gender
and LastStart
, use
out = pd.crosstab(df['LastStart'], df['Gender']).rename_axis(index=None, columns=None)
CodePudding user response:
Use custom aggregation functions, one per column you want in the output.
df.groupby("LastStart").agg(
M=("Gender", lambda s: s.eq("M").sum()),
F=("Gender", lambda s: s.eq("F").sum()),
)
The “syntax” used in .agg()
here is <output column>=(<input column>, <aggregation function>)
CodePudding user response:
Here is one way to do it creatng a temporary cnt value and the using pivot table
df.assign(cnt=1).pivot_table(index='LastStart', columns='Gender' , values='cnt', aggfunc='count').fillna(0).astype(int)
Gender F M
LastStart
2013-05-21 0 1
2013-05-24 0 1
2013-05-27 1 1
2013-05-28 2 0