I'm trying to count the unique number of Customer_Key where the column Broad_Category has both the values A and B grouped by values in column Month. The sample dataframe is as follows
Customer_Key | Category | Month |
---|---|---|
ck123 | A | 2 |
ck234 | A | 2 |
ck234 | B | 2 |
ck680 | A | 3 |
ck123 | B | 3 |
ck123 | A | 3 |
ck356 | B | 3 |
ck345 | A | 4 |
The expected outcome is
Month | Unique Customers |
---|---|
2 | 1 |
3 | 1 |
4 | 0 |
I'm not able to think of something here. Any lead/help will be appreciated. Thanks in advance.emphasized text
CodePudding user response:
Here is one way to accomplish it
first its grouping by Month and Customer that get us the customer within a month with the count of the categories. The result is further grouped by Month and we choose the max count.
Decrementing the count give us the required count of unique customer belonging to both categories
hope it helps
df2=df.groupby(['Month','Customer_Key']).count().reset_index().groupby(['Month'])['Category'].max().reset_index()
df2['Category'] = df2['Category'] -1
df2.rename(columns={'Category': 'Unique Cusomter'}, inplace=True)
df2
Month Unique Cusomter
0 2 1
1 3 1
2 4 0
CodePudding user response:
Try something like that:
df.groupby(['Customer_Key', 'Month']) \
.sum() \
.query("Category in ('AB','BA')") \
.groupby('Month') \
.count() \
.rename(columns={'Category': 'Unique Customers'})
Edit...
The issue with this solution is that it does not count months with 0. I have prepared a fix:
import pandas as pd
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
data = StringIO("""ck123 A 2
ck234 A 2
ck234 B 2
ck680 A 3
ck123 B 3
ck123 A 3
ck356 B 3
ck345 A 4""")
df1 = df.groupby(['Customer_Key', 'Month']) \
.sum() \
.reset_index()
def map_categories(column):
if column['Category'] in ('AB', 'BA'):
return 1
else:
return 0
df1['Unique Customers'] = df1.apply(lambda row: map_categories(row), axis=1)
df1 = df1.groupby('Month')['Unique Customers'].sum().reset_index()