Home > Enterprise >  Change values in Pandas cells based on value_counts() condition
Change values in Pandas cells based on value_counts() condition

Time:11-16

How can I change the values in specific columns in pandas dataframe, based on the condition. This is my dataframe:

import pandas as pd

df = pd.DataFrame({'data':['lemon', 'apple', 'lemon', 'apple', 'apple', 'lemon', 'pear', 'apple', 
                            'pear', 'lemon', 'pear', 'orange', 'banana', 'banana', 'pear']})

     data
0    lemon
1    apple
2    lemon
3    apple
4    apple
5    lemon
6     pear
7    apple
8     pear
9    lemon
10    pear
11  orange
12  banana
13  banana
14    pear

Counting each element:

lemon     4
apple     4
pear      4
banana    2
orange    1
Name: data, dtype: int64

How can I change the value to 'other'if value_counts() result is less than 4? Expected result:

     data
0    lemon
1    apple
2    lemon
3    apple
4    apple
5    lemon
6     pear
7    apple
8     pear
9    lemon
10    pear
11  other
12  other
13  other
14    pear

CodePudding user response:

Use Series.mask with counts values by Series.map with Series.value_counts and test if less like 4:

df['data'] = df['data'].mask(df['data'].map(df['data'].value_counts()).lt(4), 'other')
#alternative
df['data'] = df['data'].mask(df.groupby('data')['data'].transform('size').lt(4), 'other')
print (df)
     data
0   lemon
1   apple
2   lemon
3   apple
4   apple
5   lemon
6    pear
7   apple
8    pear
9   lemon
10   pear
11  other
12  other
13  other
14   pear

CodePudding user response:

We could apply function like this.

df['data'] = df['data'].apply(lambda x : 'other' if len(df[df.data==x])<4 else x)
  • Related