Home > Software engineering >  How to assign new value that is not in my list with the value that occurs least number of times in a
How to assign new value that is not in my list with the value that occurs least number of times in a

Time:03-30

i have a list which contains the following details of a column named 'Broker'


list1 = ['NATIONAL DISTRIBUTOR', 'MUTUAL FUND DISTRIBUTOR', 'BANKS',
       'DIRECT']

I have a dataframe as df1

df1:

     PAN_NO             BROKER
0     AAA         NATIONAL DISTRIBUTOR
1     BBB         MUTUAL FUND DISTRIBUTOR
2     CCC               BANKS
3     DDD               BANKS
4     EEE               BANKS
5     FFF         NATIONAL DISTRIBUTOR
6     GGG         NATIONAL DISTRIBUTOR
7     HHH               RIA

and so on ..

I have a condition:

If the broker column in the df1 contains any value that is not in my list1 then that new value in the broker column should be replaced with the value that occurs least number of times in the df1.

Example in the df1 mentioned above MUTUAL FUND DISTRIBUTOR occurs least number of times so the new value 'RIA' should be replaced with MUTUAL FUND DISTRIBUTOR.

Expected Output:

     PAN_NO             BROKER
0     AAA         NATIONAL DISTRIBUTOR
1     BBB         MUTUAL FUND DISTRIBUTOR
2     CCC               BANKS
3     DDD               BANKS
4     EEE               BANKS
5     FFF         NATIONAL DISTRIBUTOR
6     GGG         NATIONAL DISTRIBUTOR
7     HHH         MUTUAL FUND DISTRIBUTOR

i tried doing the below:


col = df1.BROKER.unique()

for i in col:
   if i not in list1:
       i = df1['BROKER'].min() -- i know this is incorrect. :(


And if the number of occurance for any 2 values in df1 are the same then the new value can be assigned with any one value.

Any help would be appreciated

CodePudding user response:

I'm not 100% shure is it what you need, but for the given example this code do the trick:

df1.loc[~df1['BROKER'].isin(list1),'BROKER'] = df1['BROKER'].value_counts().idxmin()
  • Related