Home > database >  How to replace a value in a column based on the its value count
How to replace a value in a column based on the its value count

Time:08-24

I am working on the following dataset: https://drive.google.com/file/d/1UVgSfIO-46aLKHeyk2LuKV6nVyFjBdWX/view?usp=sharing

I am trying to replace the countries in the "Nationality" column whose value_counts() are less than 450 with the value of "Others".

def collapse_category(df):
   df.loc[df['Nationality'].map(df['Nationality'].value_counts(normalize=True)
                                  .lt(450)), 'Nationality'] = 'Others'
   print(df['Nationality'].unique())

This is the code I used but it returns the result as this: ['Others'] Here is the link to my notebook for reference: https://colab.research.google.com/drive/1MfwwBfi9_4E1BaZcPnS7KJjTy8xVsgZO?usp=sharing

CodePudding user response:

Use boolean indexing:

s = df['Nationality'].value_counts()
df.loc[df['Nationality'].isin(s[s<450].index), 'Nationality'] = 'Others'

New value_counts after the change:

FRA       12307
PRT       11382
DEU       10164
GBR        8610
Others     5354
ESP        4864
USA        3398
...         ...
FIN         632
RUS         578
ROU         475
Name: Nationality, dtype: int64

CodePudding user response:

value_filter = df.Nationality.value_counts().lt(450)

temp_dict = value_filter[value_filter == False].replace({False: "others"}).to_dict()

df = df.replace(temp_dict)

In general, the third line will look up the entire df rather than a particular column. But the above code will work for you.

  • Related