Home > Mobile >  Pandas Python - How to group customer IDs, examine if a column contains a text, and have correspondi
Pandas Python - How to group customer IDs, examine if a column contains a text, and have correspondi

Time:06-14

My dataframe has 2 columns: i) Customer ID ii) Status. For each customer, if there is any rows pertaining to that customer that contains the status 'Confirmed', it should return 'Active' in a third column, else it should return 'Inactive'. Appreciate any help please, thanks!

**Current Dataframe**

Customer ID    Status         
A              Confirmed      
A              Transferred    
A              Confirmed       
A              Withdrawn       
B              Transferred     
B              Withdrawn       
B              Transferred     
C              Confirmed      
D              Withdrawn   

---

**Expected Output**

Customer ID    Status          Customer Status
A              Confirmed       Active
A              Transferred     Active 
A              Confirmed       Active
A              Withdrawn       Active
B              Transferred     Inactive
B              Withdrawn       Inactive
B              Transferred     Inactive
C              Confirmed       Active
D              Withdrawn       Inactive

CodePudding user response:

You can test if match at least one values per groups by Series.any, because need mask with same size like original column use GroupBy.transform, last pass to numpy.where:

m = df['Status'].eq('Confirmed').groupby(df['Customer ID']).transform('any')
df['Customer Status'] = np.where(m, 'Active','Inactive')

print (df)
  Customer ID       Status Customer Status
0           A    Confirmed          Active
1           A  Transferred          Active
2           A    Confirmed          Active
3           A    Withdrawn          Active
4           B  Transferred        Inactive
5           B    Withdrawn        Inactive
6           B  Transferred        Inactive
7           C    Confirmed          Active
8           D    Withdrawn        Inactive

Or get all Customer ID if match at least one value and compare original columns by Series.isin:

m = df['Customer ID'].isin(df.loc[df['Status'].eq('Confirmed'),'Customer ID'])
df['Customer Status'] = np.where(m, 'Active','Inactive')
  • Related