Home > Enterprise >  Groupby Month and Year and get the count of unique items as columns
Groupby Month and Year and get the count of unique items as columns

Time:07-02

I have a Dataframe like the following:

  Date              Desc
 2022-04-31         Car
 2022-05-31         Bus
 2022-05-31         Car
 2022-05-31         Bike
 2022-05-31         Car
 2022-06-14         Car
 2022-06-19         Car
 2022-06-19         Bike
 2022-07-19         Bike
 2022-07-19         Bus

How can I group it by month and year and then get the count of Desc as columns?

Expected Output:

Date     Car  Bus  Bike
2022-04   1    0    0
2022-05   2    1    1
2022-06   2    0    2
2022-07   0    1    1

CodePudding user response:

This approach works efficiently, only need to get the dummies and count grouping by the date with the desired frequency:

df = pd.DataFrame({'Desc':['Car','Bus','Car','Bike','Car','Car','Car','Bike','Bike','Bus'], 
                   'Date':['2022-04-21','2022-05-31','2022-05-31','2022-05-31','2022-05-31',
                           '2022-06-14','2022-06-19','2022-06-19','2022-07-19','2022-07-19']})
df.Date = pd.to_datetime(df.Date)
df = df.set_index('Date')

pd.get_dummies(df,'','').groupby(pd.Grouper(freq='M')).sum()

result:

Date     Car  Bus  Bike
2022-04   1    0    0
2022-05   2    1    1
2022-06   2    0    2
2022-07   0    1    1

CodePudding user response:

You can try pd.crosstab

df['Date'] = pd.to_datetime(df['Date'])

out = pd.crosstab(
    df['Date'].dt.strftime('%Y-%m'), df['Desc']
).reset_index().rename_axis(columns=None)
print(out)

      Date  Bike  Bus  Car
0  2022-04     0    0    1
1  2022-05     1    1    2
2  2022-06     1    0    2
3  2022-07     1    1    0

CodePudding user response:

You can also do it manipulating the strings without converting them to datetime. First parse them to year, month and day and put them in a pandas DataFrame

df_ix = pd.DataFrame((x for x in df.Date.str.split("-")), columns=["Y", "M", "D"])

Then concatenate these features with you original DataFrame and use them to group by year, month, and vehicle type. Use count to aggregate the results for each group (keep a single column and rename to avoid confusion)

df_multiindex = pd.concat((df, df_ix), axis=1).groupby(["Y", "M", "Desc"]).count()["Date"] 
df_multiindex.name = "Vehicle counts"

The output is

print(df_multiindex) 
Y     M   Desc
2022  04  Car     1
      05  Bike    1
          Bus     1
          Car     2
      06  Bike    1
          Car     2
      07  Bike    1
          Bus     1
Name: Vehicle counts, dtype: int64
  • Related