Home > other >  Counting number of occurrences for each item per month in pandas dataframe
Counting number of occurrences for each item per month in pandas dataframe

Time:08-25

Date Items Location
2022-01-01 Item1 Location1
2022-01-01 Item2 Location1
2022-01-02 Item1 Location3
2022-01-03 Item3 Location1
2022-02-01 Item1 Location2
... ... ...

I have a pandas dataframe thats similar to the table above. How do I create a new data frame with the Items column grouped and made into the index and the Date column organized into specific months with each month as a column? The numbers inside the dataframe will be the number of occurrences of each item for each month.

I'll need to change the date to months and change the index, but I'm not too sure how to do it exactly.

The table of the table I want to create is something like the one below:

- Date
Items Jan Feb
Item1 5 3
Item2 3 2
Item3 3 5
... ... ...

CodePudding user response:

Looks like you want a crosstab using periods of your dates:

out = pd.crosstab(df['Items'], pd.PeriodIndex(df['Date'], freq='M'))

output:

col_0  2022-01  2022-02
Items                  
Item1        2        1
Item2        1        0
Item3        1        0

CodePudding user response:

What you want is called a pivot. In order for the months to appear in the correct order (Jan, Feb, Mar, ...) and not alphabetical order (Apr, Aug, Dec, ...), you also need to convert the month names to an ordered categorical type:

# Generate the month names Jan, Feb, Mar, ... and indicate
# that they have an intrinsic order
MonthNameDtype = pd.CategoricalDtype(pd.date_range("2022-01-01", freq="MS", periods=12).strftime("%b"), ordered=True)

# Extract the month name from the dates
df["Month"] = df["Date"].dt.strftime("%b").astype(MonthNameDtype)

# Pivot
df.pivot_table(index="Items", columns="Month", values="Location", aggfunc="count")

CodePudding user response:

Add a month column to your dataframe :

df['month'] = df['Date'].dt.strftime('%b')

Then the following :

res = (df
       .groupby(["Items", "month"])
       .size()
       .reset_index()
       .pivot(columns="month", index="Items")
       )

should output the desired result :

month   Feb  Jan
Items           
Item1   1.0  2.0
Item2   NaN  1.0
Item3   NaN  1.0

CodePudding user response:

Using the link from OP's GH a complete solution using Grouper and pivot:

df = pd.read_csv(
    "https://raw.githubusercontent.com/yongkheehou/recyclegowheredata/main/random_generated_website_usage_data.csv",
    parse_dates=["Date"],
    index_col="Date"
)                   
g = (
    df.groupby([pd.Grouper(freq="M"), "Item"])
    .count().reset_index()
    .pivot(columns="Date", index="Item", values="Bin Location")
    .fillna(0)
    .astype(int)
)

It seems right:

>>> g.iloc[:5, :5]
Date              2021-01-31  2021-02-28  2021-03-31  2021-04-30  2021-05-31
Item
Aerosol Can               11          14          10          11          15
Air-Conditioners           1           2           0           0           2
Backpacks                  0           0           1           0           0
Baking Glassware           0           0           0           0           0
Bedding                    0           0           2           2           1

This answer is exactly the same as @mozway's https://stackoverflow.com/a/73473157/837451 shorter and more elegant one :) except for the fact that groupby.count doesn't count NaNs. Since "Bin Location" in the original data has nans, it's not identical. If counting an always non-nan column like "s/n" the results are identical. Which answer is correct depends on your use case.

  • Related