Home > Enterprise >  Getting Number of Months for each Item from a Date Column in Pandas
Getting Number of Months for each Item from a Date Column in Pandas

Time:02-04

I have a df with two columns: Date and Products, certain products can be available in some months and not in other months, and I need to get number of months of availability for each product.

In [1]: data = {'Date': ['04-02-2022', '04-03-2022', '04-29-2022', '04-30-2022','05-02-2022','05-04-2022','05-20-2022','05-25-2022',
                     '05-30-2022','06-02-2022','06-05-2022','06-10-2022','06-12-2022','06-25-2022','06-30-2022','07-02-2022',
            '07-02-2022','07-05-2022','07-08-2022','07-15-2022','08-02-2022','08-12-2022','08-15-2022'],
            'Products': ['prod1', 'prod1', 'prod2', 'prod3','prod1','prod2','prod1','prod3','prod3','prod1','prod2','prod3','prod2',
                        'prod3','prod1','prod2','prod2','prod2','prod3','prod3','prod3','prod3','prod3']}
        df = pd.DataFrame(data)
    
        df['Date'] = df['Date'].astype('datetime64')

When I use the below with print() I get number of months for each product.

In [2]: for product in df['Products'].unique():
            num_months = df.loc[df.Products==product].Date.dt.to_period('M')
            num_months = num_months.unique()
            num_months = num_months.size
    
            print(num_months)  
Out [3]: 3
         4
         5

Prod1 was available in 3 different months, and prod2 in 4 different months, and prod3 in 5 different months.

But then I'm trying to create a function that I can give it an argument and returns the number of availability months for each product, the function returns only the last number count of prod3.

In [4]: def months_count(product):
        for product in df['Products'].unique():
            num_of_months = df.loc[df.Products==product].Date.dt.to_period('M')
            num_of_months = num_of_months.unique()
            num_of_months = num_of_months.size
    
        return num_of_months

In [5]: months_count('prod1')
Out[5]: 5

Here the out is only the month count for prod3, what am I doing wrong?

CodePudding user response:

No need for a function, simply use groupby.nunique on the period:

out = df['Date'].dt.to_period('M').groupby(df['Products']).nunique()

Output:

Products
prod1    3
prod2    4
prod3    5
Name: Date, dtype: int64

CodePudding user response:

The months_count you defined loops on all products, it should only use the passed product as argument.

Here is the correct version of the months_count function you want:

def months_count(product):
    num_of_months = df.loc[df.Products==product].Date.dt.to_period('M')
    num_of_months = num_of_months.unique()
    num_of_months = num_of_months.size

    return num_of_months
  • Related