I have below data frame.
df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS']})
I wanted to find if the vin column contains a unique value then in the Result column it should return 'YES' and if the vin column is not unique then it will check the 'module' column and return 'YES' where the module column has more priority value.
I want output like the below data frame.
df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS'],
'Result':['YES','YES','NO','NO','NO','YES','YES','YES','NO','YES']})
Below code, I have tried and it gives the correct result but it involves too many steps.
df['count'] = df.groupby('vin').vin.transform('count')
def Check1(df):
if (df["count"] == 1):
return 1
elif ((df["count"] != 1) & (df["module"] == '1ST_PRIORITY')):
return 1
elif ((df["count"] != 1) & (df["module"] == '2ND_PRIORITY')):
return 2
elif ((df["count"] != 1) & (df["module"] == '3RD_PRIORITY')):
return 3
else:
return 4
df['Sort'] = df.apply(Check1, axis=1)
df = df.sort_values(by=['vin', 'Sort'])
df.drop_duplicates(subset=['vin'], keep='first',inplace = True)
df
CodePudding user response:
Here's the trick, you need a custom order
:
from pandas.api.types import CategoricalDtype
#create your custom order
custom_order = CategoricalDtype(
['Delhi','Agra','Paris','ABS','HELLO','NORMAL'],
ordered=True)
#then attribute it to the desired column
df['module'] = df['module'].astype(custom_order)
df['Result'] = ((~df.sort_values('module', ascending=True).duplicated('vin'))
.replace({True: 'YES', False: 'NO'}))
Result:
index | vin | module | Result |
---|---|---|---|
0 | aaa | NORMAL | YES |
1 | bbb | Delhi | YES |
2 | bbb | Agra | NO |
3 | bbb | HELLO | NO |
4 | ccc | Paris | NO |
5 | ccc | Agra | YES |
6 | ddd | Agra | YES |
7 | eee | Paris | YES |
8 | eee | HELLO | NO |
9 | fff | ABS | YES |
CodePudding user response:
IIUC, you can use duplicated
after sort_values
:
df['Result'] = ((~df.sort_values('module').duplicated('vin'))
.replace({True: 'YES', False: 'NO'}))
print(df)
# Output
vin module Result
0 aaa NORMAL YES
1 bbb 1ST_PRIORITY YES
2 bbb 2ND_PRIORITY NO
3 bbb HELLO NO
4 ccc 3RD_PRIORITY NO
5 ccc 2ND_PRIORITY YES
6 ddd 2ND_PRIORITY YES
7 eee 3RD_PRIORITY YES
8 eee HELLO NO
9 fff ABS YES