I have a Dataframe like the following:
Date Desc
2022-04-31 Car
2022-05-31 Bus
2022-05-31 Car
2022-05-31 Bike
2022-05-31 Car
2022-06-14 Car
2022-06-19 Car
2022-06-19 Bike
2022-07-19 Bike
2022-07-19 Bus
How can I group it by month and year and then get the count of Desc
as columns?
Expected Output:
Date Car Bus Bike
2022-04 1 0 0
2022-05 2 1 1
2022-06 2 0 2
2022-07 0 1 1
CodePudding user response:
This approach works efficiently, only need to get the dummies and count grouping by the date with the desired frequency:
df = pd.DataFrame({'Desc':['Car','Bus','Car','Bike','Car','Car','Car','Bike','Bike','Bus'],
'Date':['2022-04-21','2022-05-31','2022-05-31','2022-05-31','2022-05-31',
'2022-06-14','2022-06-19','2022-06-19','2022-07-19','2022-07-19']})
df.Date = pd.to_datetime(df.Date)
df = df.set_index('Date')
pd.get_dummies(df,'','').groupby(pd.Grouper(freq='M')).sum()
result:
Date Car Bus Bike
2022-04 1 0 0
2022-05 2 1 1
2022-06 2 0 2
2022-07 0 1 1
CodePudding user response:
You can try pd.crosstab
df['Date'] = pd.to_datetime(df['Date'])
out = pd.crosstab(
df['Date'].dt.strftime('%Y-%m'), df['Desc']
).reset_index().rename_axis(columns=None)
print(out)
Date Bike Bus Car
0 2022-04 0 0 1
1 2022-05 1 1 2
2 2022-06 1 0 2
3 2022-07 1 1 0
CodePudding user response:
You can also do it manipulating the strings without converting them to datetime. First parse them to year, month and day and put them in a pandas DataFrame
df_ix = pd.DataFrame((x for x in df.Date.str.split("-")), columns=["Y", "M", "D"])
Then concatenate these features with you original DataFrame and use them to group by year, month, and vehicle type. Use count to aggregate the results for each group (keep a single column and rename to avoid confusion)
df_multiindex = pd.concat((df, df_ix), axis=1).groupby(["Y", "M", "Desc"]).count()["Date"]
df_multiindex.name = "Vehicle counts"
The output is
print(df_multiindex)
Y M Desc
2022 04 Car 1
05 Bike 1
Bus 1
Car 2
06 Bike 1
Car 2
07 Bike 1
Bus 1
Name: Vehicle counts, dtype: int64