Home > database >  How to provide a groupby of dates with a count from a categorical column separately
How to provide a groupby of dates with a count from a categorical column separately

Time:07-03

I have a transactional table that lists the gender of each person who started in one column, and the date they started in another. I want to perform a groupby that results in a data frame with a count of each gender by date. Any idea?

    Gender  LastStart
1   M   2013-05-21
2   M   2013-05-24
3   F   2013-05-27
4   M   2013-05-27
5   F   2013-05-28
6   F   2013-05-28

Should result in

             M   F
2013-05-21   1   0
2013-05-24   1   0
2013-05-27   1   1
2013-05-28   0   2

I think I need to run a groupby over a list and then a pivot but my groupby is producing a series so the pivot won't work. I'm very confused so any help much appreciated!

CodePudding user response:

Use pandas.crosstab

out = pd.crosstab(df['LastStart'], df['Gender'])

Output:

>>> out

Gender      F  M
LastStart       
2013-05-21  0  1
2013-05-24  0  1
2013-05-27  1  1
2013-05-28  2  0

If you want to remove the axis labels, i.e. Gender and LastStart, use

out = pd.crosstab(df['LastStart'], df['Gender']).rename_axis(index=None, columns=None)

CodePudding user response:

Use custom aggregation functions, one per column you want in the output.

df.groupby("LastStart").agg(
    M=("Gender", lambda s: s.eq("M").sum()),
    F=("Gender", lambda s: s.eq("F").sum()),
)

The “syntax” used in .agg() here is <output column>=(<input column>, <aggregation function>)

CodePudding user response:

Here is one way to do it creatng a temporary cnt value and the using pivot table

df.assign(cnt=1).pivot_table(index='LastStart', columns='Gender' , values='cnt', aggfunc='count').fillna(0).astype(int)
Gender      F   M
 LastStart      
2013-05-21  0   1
2013-05-24  0   1
2013-05-27  1   1
2013-05-28  2   0
  • Related