I have a DataFrame which contains rows of orders from customers. I want to create a column which returns True or False values when the customer has ordered twice before. So the third time they make an order, the column 'Recurring Customer' gets a True value.
The DataFrame looks like this:
df = pd.DataFrame({
'customer_id': ['5257', '8034', '21474', '21474', '21474', '6157']
})
The desired output should look like this:
df = pd.DataFrame({
'customer_id': ['5257', '8034', '21474', '21474', '21474', '6157'],
'recurring_customer: ['False', 'False', 'False', 'True', 'False]
})
I guess I have to use the np.where function but I don't know how to use it with unique and non-unique values. Could you help me with the last bit?
df['recurring_customer'] = np.where(df['customer_id']
CodePudding user response:
Use groupby_cumcount
:
df['recurring_customer'] = df.groupby('customer_id').cumcount() >= 2 # or == 2?
print(df)
# Output:
customer_id recurring_customer
0 5257 False
1 8034 False
2 21474 False
3 21474 False
4 21474 True
5 6157 False