pandas: groupby incident type over yearly tallies-CodePudding

I'm trying to create like a pivot table using groupby to group by incidents over yearly tallies, like so:

Original data (sample):

INCIDENT INCIDENT_YEAR
run      2020
walk     2021
jump     2019
run      2021
run      2020

And I want to groupby to look like:

INCIDENT 2019 2020 2021
jump     1
run           2    1
walk               1

I know this can be done in Excel, but I want to learn how to do it in pandas.

My best attempt is

df.groupby('INCIDENT').count()

But that's not giving me the desired columns.

CodePudding user response：

Use pd.crosstab:

stats = pd.crosstab(df['INCIDENT'], df['INCIDENT_YEAR'])

Output:

>>> stats
INCIDENT_YEAR  2019  2020  2021
INCIDENT                       
jump              1     0     0
run               0     2     1
walk              0     0     1

CodePudding user response：

Update 2

If you have another level of grouping:

df.groupby(['INCIDENT', 'DISTRICT', 'INCIDENT_YEAR']).size().unstack(fill_value=0)

# OR

df.assign(dummy=1).pivot_table('dummy', ['INCIDENT', 'DISTRICT'], 'INCIDENT_YEAR',
                               aggfunc='sum', fill_value=0)

# Output
INCIDENT_YEAR        2019  2020  2021
INCIDENT DISTRICT                    
jump     District A     1     0     0
run      District A     0     2     1
walk     District A     0     0     1

Update

It's possible to use groupby too:

>>> df.groupby(['INCIDENT', 'INCIDENT_YEAR']).size().unstack(fill_value=0)
INCIDENT_YEAR  2019  2020  2021
INCIDENT                       
jump              1     0     0
run               0     2     1
walk              0     0     1

You can use pivot_table as alternative of pd.crosstab:

>>> df.assign(dummy=1).pivot_table('dummy', 'INCIDENT', 'INCIDENT_YEAR',
                                   aggfunc='sum', fill_value=0)
INCIDENT_YEAR  2019  2020  2021
INCIDENT                       
jump              1     0     0
run               0     2     1
walk              0     0     1

CodePudding user response：

You could use crosstab:

out = pd.crosstab(df['INCIDENT'], df['INCIDENT_YEAR']).reset_index().rename_axis(columns=[None])

Output:

  INCIDENT  2019  2020  2021
0     jump     1     0     0
1      run     0     2     1
2     walk     0     0     1

CodePudding user response：

Adding a little bit of variety, although enke's and richardec's answers remain the favorite:

df.reset_index().pivot_table(index='INCIDENT',columns='INCIDENT_YEAR',values='index',aggfunc='count',fill_value=0)

Outputs:

INCIDENT_YEAR  2019  2020  2021
INCIDENT                       
jump            1.0     0     0
run               0   2.0   1.0
walk              0     0   1.0