I have a df with two columns: Date and Products, certain products can be available in some months and not in other months, and I need to get number of months of availability for each product.
In [1]: data = {'Date': ['04-02-2022', '04-03-2022', '04-29-2022', '04-30-2022','05-02-2022','05-04-2022','05-20-2022','05-25-2022',
'05-30-2022','06-02-2022','06-05-2022','06-10-2022','06-12-2022','06-25-2022','06-30-2022','07-02-2022',
'07-02-2022','07-05-2022','07-08-2022','07-15-2022','08-02-2022','08-12-2022','08-15-2022'],
'Products': ['prod1', 'prod1', 'prod2', 'prod3','prod1','prod2','prod1','prod3','prod3','prod1','prod2','prod3','prod2',
'prod3','prod1','prod2','prod2','prod2','prod3','prod3','prod3','prod3','prod3']}
df = pd.DataFrame(data)
df['Date'] = df['Date'].astype('datetime64')
When I use the below with print()
I get number of months for each product.
In [2]: for product in df['Products'].unique():
num_months = df.loc[df.Products==product].Date.dt.to_period('M')
num_months = num_months.unique()
num_months = num_months.size
print(num_months)
Out [3]: 3
4
5
Prod1 was available in 3 different months, and prod2 in 4 different months, and prod3 in 5 different months.
But then I'm trying to create a function that I can give it an argument and returns the number of availability months for each product, the function returns only the last number count of prod3.
In [4]: def months_count(product):
for product in df['Products'].unique():
num_of_months = df.loc[df.Products==product].Date.dt.to_period('M')
num_of_months = num_of_months.unique()
num_of_months = num_of_months.size
return num_of_months
In [5]: months_count('prod1')
Out[5]: 5
Here the out is only the month count for prod3, what am I doing wrong?
CodePudding user response:
No need for a function, simply use groupby.nunique
on the period:
out = df['Date'].dt.to_period('M').groupby(df['Products']).nunique()
Output:
Products
prod1 3
prod2 4
prod3 5
Name: Date, dtype: int64
CodePudding user response:
The months_count
you defined loops on all products, it should only use the passed product
as argument.
Here is the correct version of the months_count
function you want:
def months_count(product):
num_of_months = df.loc[df.Products==product].Date.dt.to_period('M')
num_of_months = num_of_months.unique()
num_of_months = num_of_months.size
return num_of_months