Home > Mobile >  Write out only the best X customer names and mark the rest as "Other"
Write out only the best X customer names and mark the rest as "Other"

Time:07-10

I have a problem. I only want the top 3 customers to receive the names, all other customers should be shown as Other. This should all be written in the column name2. The top 3 customers are simply determined based on frequency. The problem is, I get my v with the values. However, how can I say that as long as the customers contained in v.index[v.gt(2)]) are to receive the name and the rest Other in the column name2.

Dataframe

    customerId     name
0            1      max
1            1      max
2            2     lisa
3            2     lisa
4            2     lisa
5            2     lisa
6            3  michael
7            3  michael
8            3  michael
9            4    power
10           5     wind
11           5     wind
12           5     wind
13           5     wind
14           5     wind

Code

import pandas as pd
d = {
    "customerId": [1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 5, 5, 5, 5],
     "name": ['max', 'max', 'lisa', 'lisa', 'lisa', 'lisa', 'michael', 'michael', 'michael', 'power',
              'wind', 'wind', 'wind', 'wind', 'wind',]
}
df = pd.DataFrame(data=d)
print(df)

v = df['customerId'].value_counts()
df['name2'] = 'Other'
df.loc[df['customerId']].isin(v.index[v.gt(2)]), 'name2'] = df['name']

Desired output

    customerId     name  name2
0            1      max  Other
1            1      max  Other
2            2     lisa  lisa
3            2     lisa  lisa
4            2     lisa  lisa
5            2     lisa  lisa
6            3  michael  michael
7            3  michael  michael
8            3  michael  michael
9            4    power  Other
10           5     wind  wind
11           5     wind  wind
12           5     wind  wind
13           5     wind  wind
14           5     wind  wind

CodePudding user response:

v = df['customerId'].value_counts()

df["name2"] = np.where(df['customerId'].isin(v.nlargest(3).index), df["name"], "Other")

    customerId     name    name2
0            1      max    Other
1            1      max    Other
2            2     lisa     lisa
3            2     lisa     lisa
4            2     lisa     lisa
5            2     lisa     lisa
6            3  michael  michael
7            3  michael  michael
8            3  michael  michael
9            4    power    Other
10           5     wind     wind
11           5     wind     wind
12           5     wind     wind
13           5     wind     wind
14           5     wind     wind

CodePudding user response:

Your problem is simple. Take this as a correction:

df.loc[df['customerId'].isin(v.index[v.gt(2)]), 'name2'] = df['name']
  • Related