I have a problem. I only want the top 3 customers to receive the names, all other customers should be shown as Other
. This should all be written in the column name2
. The top 3 customers are simply determined based on frequency. The problem is, I get my v
with the values. However, how can I say that as long as the customers contained in v.index[v.gt(2)])
are to receive the name
and the rest Other
in the column name2
.
Dataframe
customerId name
0 1 max
1 1 max
2 2 lisa
3 2 lisa
4 2 lisa
5 2 lisa
6 3 michael
7 3 michael
8 3 michael
9 4 power
10 5 wind
11 5 wind
12 5 wind
13 5 wind
14 5 wind
Code
import pandas as pd
d = {
"customerId": [1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 5, 5, 5, 5],
"name": ['max', 'max', 'lisa', 'lisa', 'lisa', 'lisa', 'michael', 'michael', 'michael', 'power',
'wind', 'wind', 'wind', 'wind', 'wind',]
}
df = pd.DataFrame(data=d)
print(df)
v = df['customerId'].value_counts()
df['name2'] = 'Other'
df.loc[df['customerId']].isin(v.index[v.gt(2)]), 'name2'] = df['name']
Desired output
customerId name name2
0 1 max Other
1 1 max Other
2 2 lisa lisa
3 2 lisa lisa
4 2 lisa lisa
5 2 lisa lisa
6 3 michael michael
7 3 michael michael
8 3 michael michael
9 4 power Other
10 5 wind wind
11 5 wind wind
12 5 wind wind
13 5 wind wind
14 5 wind wind
CodePudding user response:
v = df['customerId'].value_counts()
df["name2"] = np.where(df['customerId'].isin(v.nlargest(3).index), df["name"], "Other")
customerId name name2
0 1 max Other
1 1 max Other
2 2 lisa lisa
3 2 lisa lisa
4 2 lisa lisa
5 2 lisa lisa
6 3 michael michael
7 3 michael michael
8 3 michael michael
9 4 power Other
10 5 wind wind
11 5 wind wind
12 5 wind wind
13 5 wind wind
14 5 wind wind
CodePudding user response:
Your problem is simple. Take this as a correction:
df.loc[df['customerId'].isin(v.index[v.gt(2)]), 'name2'] = df['name']