How can I change the values in specific columns in pandas dataframe, based on the condition. This is my dataframe:
import pandas as pd
df = pd.DataFrame({'data':['lemon', 'apple', 'lemon', 'apple', 'apple', 'lemon', 'pear', 'apple',
'pear', 'lemon', 'pear', 'orange', 'banana', 'banana', 'pear']})
data
0 lemon
1 apple
2 lemon
3 apple
4 apple
5 lemon
6 pear
7 apple
8 pear
9 lemon
10 pear
11 orange
12 banana
13 banana
14 pear
Counting each element:
lemon 4
apple 4
pear 4
banana 2
orange 1
Name: data, dtype: int64
How can I change the value to 'other'if value_counts() result is less than 4? Expected result:
data
0 lemon
1 apple
2 lemon
3 apple
4 apple
5 lemon
6 pear
7 apple
8 pear
9 lemon
10 pear
11 other
12 other
13 other
14 pear
CodePudding user response:
Use Series.mask
with counts values by Series.map
with Series.value_counts
and test if less like 4
:
df['data'] = df['data'].mask(df['data'].map(df['data'].value_counts()).lt(4), 'other')
#alternative
df['data'] = df['data'].mask(df.groupby('data')['data'].transform('size').lt(4), 'other')
print (df)
data
0 lemon
1 apple
2 lemon
3 apple
4 apple
5 lemon
6 pear
7 apple
8 pear
9 lemon
10 pear
11 other
12 other
13 other
14 pear
CodePudding user response:
We could apply function like this.
df['data'] = df['data'].apply(lambda x : 'other' if len(df[df.data==x])<4 else x)