My dataframe has 2 columns: i) Customer ID ii) Status. For each customer, if there is any rows pertaining to that customer that contains the status 'Confirmed', it should return 'Active' in a third column, else it should return 'Inactive'. Appreciate any help please, thanks!
**Current Dataframe**
Customer ID Status
A Confirmed
A Transferred
A Confirmed
A Withdrawn
B Transferred
B Withdrawn
B Transferred
C Confirmed
D Withdrawn
---
**Expected Output**
Customer ID Status Customer Status
A Confirmed Active
A Transferred Active
A Confirmed Active
A Withdrawn Active
B Transferred Inactive
B Withdrawn Inactive
B Transferred Inactive
C Confirmed Active
D Withdrawn Inactive
CodePudding user response:
You can test if match at least one values per groups by Series.any
, because need mask with same size like original column use GroupBy.transform
, last pass to numpy.where
:
m = df['Status'].eq('Confirmed').groupby(df['Customer ID']).transform('any')
df['Customer Status'] = np.where(m, 'Active','Inactive')
print (df)
Customer ID Status Customer Status
0 A Confirmed Active
1 A Transferred Active
2 A Confirmed Active
3 A Withdrawn Active
4 B Transferred Inactive
5 B Withdrawn Inactive
6 B Transferred Inactive
7 C Confirmed Active
8 D Withdrawn Inactive
Or get all Customer ID
if match at least one value and compare original columns by Series.isin
:
m = df['Customer ID'].isin(df.loc[df['Status'].eq('Confirmed'),'Customer ID'])
df['Customer Status'] = np.where(m, 'Active','Inactive')