I'm trying to create like a pivot table using groupby
to group by incidents
over yearly tallies, like so:
Original data (sample):
INCIDENT INCIDENT_YEAR
run 2020
walk 2021
jump 2019
run 2021
run 2020
And I want to groupby
to look like:
INCIDENT 2019 2020 2021
jump 1
run 2 1
walk 1
I know this can be done in Excel, but I want to learn how to do it in pandas.
My best attempt is
df.groupby('INCIDENT').count()
But that's not giving me the desired columns.
CodePudding user response:
Use pd.crosstab
:
stats = pd.crosstab(df['INCIDENT'], df['INCIDENT_YEAR'])
Output:
>>> stats
INCIDENT_YEAR 2019 2020 2021
INCIDENT
jump 1 0 0
run 0 2 1
walk 0 0 1
CodePudding user response:
Update 2
If you have another level of grouping:
df.groupby(['INCIDENT', 'DISTRICT', 'INCIDENT_YEAR']).size().unstack(fill_value=0)
# OR
df.assign(dummy=1).pivot_table('dummy', ['INCIDENT', 'DISTRICT'], 'INCIDENT_YEAR',
aggfunc='sum', fill_value=0)
# Output
INCIDENT_YEAR 2019 2020 2021
INCIDENT DISTRICT
jump District A 1 0 0
run District A 0 2 1
walk District A 0 0 1
Update
It's possible to use groupby
too:
>>> df.groupby(['INCIDENT', 'INCIDENT_YEAR']).size().unstack(fill_value=0)
INCIDENT_YEAR 2019 2020 2021
INCIDENT
jump 1 0 0
run 0 2 1
walk 0 0 1
You can use pivot_table
as alternative of pd.crosstab
:
>>> df.assign(dummy=1).pivot_table('dummy', 'INCIDENT', 'INCIDENT_YEAR',
aggfunc='sum', fill_value=0)
INCIDENT_YEAR 2019 2020 2021
INCIDENT
jump 1 0 0
run 0 2 1
walk 0 0 1
CodePudding user response:
You could use crosstab
:
out = pd.crosstab(df['INCIDENT'], df['INCIDENT_YEAR']).reset_index().rename_axis(columns=[None])
Output:
INCIDENT 2019 2020 2021
0 jump 1 0 0
1 run 0 2 1
2 walk 0 0 1
CodePudding user response:
Adding a little bit of variety, although enke's and richardec's answers remain the favorite:
df.reset_index().pivot_table(index='INCIDENT',columns='INCIDENT_YEAR',values='index',aggfunc='count',fill_value=0)
Outputs:
INCIDENT_YEAR 2019 2020 2021
INCIDENT
jump 1.0 0 0
run 0 2.0 1.0
walk 0 0 1.0