Home > Blockchain >  I am stuck in writing the python code for below problem
I am stuck in writing the python code for below problem

Time:06-04

I have below data frame.

df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS']})

I wanted to find if the vin column contains a unique value then in the Result column it should return 'YES' and if the vin column is not unique then it will check the 'module' column and return 'YES' where the module column has more priority value.

I want output like the below data frame.

df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS'],
               'Result':['YES','YES','NO','NO','NO','YES','YES','YES','NO','YES']})

Below code, I have tried and it gives the correct result but it involves too many steps.

df['count'] = df.groupby('vin').vin.transform('count')


def Check1(df):
    if (df["count"]  == 1):
        return 1

elif ((df["count"]  != 1) & (df["module"]  == '1ST_PRIORITY')):
    return 1

elif ((df["count"]  != 1) & (df["module"]  == '2ND_PRIORITY')):
    return 2

elif ((df["count"]  != 1) & (df["module"]  == '3RD_PRIORITY')):
    return 3
    
else:
    return 4

df['Sort'] = df.apply(Check1, axis=1)

df = df.sort_values(by=['vin', 'Sort'])

df.drop_duplicates(subset=['vin'], keep='first',inplace = True)

df

CodePudding user response:

Here's the trick, you need a custom order:

from pandas.api.types import CategoricalDtype

#create your custom order
custom_order = CategoricalDtype(
    ['Delhi','Agra','Paris','ABS','HELLO','NORMAL'], 
    ordered=True)

#then attribute it to the desired column
df['module'] = df['module'].astype(custom_order)


df['Result'] = ((~df.sort_values('module', ascending=True).duplicated('vin'))
                    .replace({True: 'YES', False: 'NO'}))

Result:

index vin module Result
0 aaa NORMAL YES
1 bbb Delhi YES
2 bbb Agra NO
3 bbb HELLO NO
4 ccc Paris NO
5 ccc Agra YES
6 ddd Agra YES
7 eee Paris YES
8 eee HELLO NO
9 fff ABS YES

CodePudding user response:

IIUC, you can use duplicated after sort_values:

df['Result'] = ((~df.sort_values('module').duplicated('vin'))
                    .replace({True: 'YES', False: 'NO'}))
print(df)

# Output
   vin        module Result
0  aaa        NORMAL    YES
1  bbb  1ST_PRIORITY    YES
2  bbb  2ND_PRIORITY     NO
3  bbb         HELLO     NO
4  ccc  3RD_PRIORITY     NO
5  ccc  2ND_PRIORITY    YES
6  ddd  2ND_PRIORITY    YES
7  eee  3RD_PRIORITY    YES
8  eee         HELLO     NO
9  fff           ABS    YES
  • Related