i have a list which contains the following details of a column named 'Broker'
list1 = ['NATIONAL DISTRIBUTOR', 'MUTUAL FUND DISTRIBUTOR', 'BANKS',
'DIRECT']
I have a dataframe as df1
df1:
PAN_NO BROKER
0 AAA NATIONAL DISTRIBUTOR
1 BBB MUTUAL FUND DISTRIBUTOR
2 CCC BANKS
3 DDD BANKS
4 EEE BANKS
5 FFF NATIONAL DISTRIBUTOR
6 GGG NATIONAL DISTRIBUTOR
7 HHH RIA
and so on ..
I have a condition:
If the broker column in the df1 contains any value that is not in my list1 then that new value in the broker column should be replaced with the value that occurs least number of times in the df1.
Example in the df1 mentioned above MUTUAL FUND DISTRIBUTOR occurs least number of times so the new value 'RIA' should be replaced with MUTUAL FUND DISTRIBUTOR.
Expected Output:
PAN_NO BROKER
0 AAA NATIONAL DISTRIBUTOR
1 BBB MUTUAL FUND DISTRIBUTOR
2 CCC BANKS
3 DDD BANKS
4 EEE BANKS
5 FFF NATIONAL DISTRIBUTOR
6 GGG NATIONAL DISTRIBUTOR
7 HHH MUTUAL FUND DISTRIBUTOR
i tried doing the below:
col = df1.BROKER.unique()
for i in col:
if i not in list1:
i = df1['BROKER'].min() -- i know this is incorrect. :(
And if the number of occurance for any 2 values in df1 are the same then the new value can be assigned with any one value.
Any help would be appreciated
CodePudding user response:
I'm not 100% shure is it what you need, but for the given example this code do the trick:
df1.loc[~df1['BROKER'].isin(list1),'BROKER'] = df1['BROKER'].value_counts().idxmin()