I am stuck in writing the python code for below problem-CodePudding

I have below data frame.

df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS']})

I wanted to find if the vin column contains a unique value then in the Result column it should return 'YES' and if the vin column is not unique then it will check the 'module' column and return 'YES' where the module column has more priority value.

I want output like the below data frame.

df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS'],
               'Result':['YES','YES','NO','NO','NO','YES','YES','YES','NO','YES']})

Below code, I have tried and it gives the correct result but it involves too many steps.

df['count'] = df.groupby('vin').vin.transform('count')


def Check1(df):
    if (df["count"]  == 1):
        return 1

elif ((df["count"]  != 1) & (df["module"]  == '1ST_PRIORITY')):
    return 1

elif ((df["count"]  != 1) & (df["module"]  == '2ND_PRIORITY')):
    return 2

elif ((df["count"]  != 1) & (df["module"]  == '3RD_PRIORITY')):
    return 3
    
else:
    return 4

df['Sort'] = df.apply(Check1, axis=1)

df = df.sort_values(by=['vin', 'Sort'])

df.drop_duplicates(subset=['vin'], keep='first',inplace = True)

df

CodePudding user response：

Here's the trick, you need a custom order:

from pandas.api.types import CategoricalDtype

#create your custom order
custom_order = CategoricalDtype(
    ['Delhi','Agra','Paris','ABS','HELLO','NORMAL'], 
    ordered=True)

#then attribute it to the desired column
df['module'] = df['module'].astype(custom_order)


df['Result'] = ((~df.sort_values('module', ascending=True).duplicated('vin'))
                    .replace({True: 'YES', False: 'NO'}))

Result:

index	vin	module	Result
0	aaa	NORMAL	YES
1	bbb	Delhi	YES
2	bbb	Agra	NO
3	bbb	HELLO	NO
4	ccc	Paris	NO
5	ccc	Agra	YES
6	ddd	Agra	YES
7	eee	Paris	YES
8	eee	HELLO	NO
9	fff	ABS	YES

CodePudding user response：

IIUC, you can use duplicated after sort_values:

df['Result'] = ((~df.sort_values('module').duplicated('vin'))
                    .replace({True: 'YES', False: 'NO'}))
print(df)

# Output
   vin        module Result
0  aaa        NORMAL    YES
1  bbb  1ST_PRIORITY    YES
2  bbb  2ND_PRIORITY     NO
3  bbb         HELLO     NO
4  ccc  3RD_PRIORITY     NO
5  ccc  2ND_PRIORITY    YES
6  ddd  2ND_PRIORITY    YES
7  eee  3RD_PRIORITY    YES
8  eee         HELLO     NO
9  fff           ABS    YES