Home > front end >  Create column with boolean values based on condition
Create column with boolean values based on condition

Time:11-18

I have a DataFrame which contains rows of orders from customers. I want to create a column which returns True or False values when the customer has ordered twice before. So the third time they make an order, the column 'Recurring Customer' gets a True value.

The DataFrame looks like this:

df = pd.DataFrame({
          'customer_id': ['5257', '8034', '21474', '21474', '21474', '6157']
})

The desired output should look like this:

df = pd.DataFrame({
          'customer_id': ['5257', '8034', '21474', '21474', '21474', '6157'],
          'recurring_customer: ['False', 'False', 'False', 'True', 'False]
})

I guess I have to use the np.where function but I don't know how to use it with unique and non-unique values. Could you help me with the last bit?

df['recurring_customer'] = np.where(df['customer_id'] 

CodePudding user response:

Use groupby_cumcount:

df['recurring_customer'] = df.groupby('customer_id').cumcount() >= 2  # or == 2?
print(df)

# Output:
  customer_id  recurring_customer
0        5257               False
1        8034               False
2       21474               False
3       21474               False
4       21474                True
5        6157               False
  • Related