Home > Software design >  Find optimal combinations of two columns based on another column value
Find optimal combinations of two columns based on another column value

Time:01-27

So, my dataframe looks like this


index       Client  Manager   Score
0             1        1      0.89
1             1        2      0.78
2             1        3      0.65
3             2        1      0.91
4             2        2      0.77
5             2        3      0.97
6             3        1      0.35
7             3        2      0.61
8             3        3      0.81
9             4        1      0.69
10            4        2      0.22
11            4        3      0.93
12            5        1      0.78
13            5        2      0.55
14            5        3      0.44
15            6        1      0.64
16            6        2      0.99
17            6        3      0.22

My expected output looks like this


index  Client   Manager  Score
0        1        1      0.89
1        2        3      0.97
2        3        2      0.61
3        4        3      0.93
4        5        1      0.78
5        6        2      0.99

We have 3 managers and 6 clients. I want each manager to have 2 clients based on highest Score. Each manager should have only unique client, so that if one client is good for two managers, we need to take second best score and so on. May I have your suggestions? Thank you in advance.

CodePudding user response:

df = df.drop("index", axis=1)
df = df.sort_values("Score").iloc[::-1,:]
df

selected_client = []
selected_manager = []
selected_df = []
iter_rows = df.iterrows()

for i,d in iter_rows:
    client = int(d.to_frame().loc[["Client"],[i]].values[0][0])
    manager = int(d.to_frame().loc[["Manager"],[i]].values[0][0]) 

    if client not in selected_client and selected_manager.count(manager) != 2:            
        selected_client.append(client)
        selected_manager.append(manager)

        selected_df.append(d)

result = pd.concat(selected_df, axis=1, sort=False)
print(result.T)

CodePudding user response:

Try this:

df = df.sort_values('Score',ascending = False) #sort values to prioritize high scores

d = {i:[] for i in df['Manager']} #create an empty dictionary to fill in the client/manager pairs

n = 2 #set number of clients per manager
for c,m in zip(df['Client'],df['Manager']): #iterate over client and manager pairs
    if len(d.get(m))<n and c not in [c2 for i in d.values() for c2,m2 in i]: #if there are not already two pairs, and if the client has not already been added, append the pair to the list
        d.get(m).append((c,m))
    else:
        pass

ndf = pd.merge(df,pd.DataFrame([k for v in d.values() for k in v],columns = ['Client','Manager'])).sort_values('Client') #filter for just the pairs found above.

Output:

   index  Client  Manager  Score
3      0       1        1   0.89
1      5       2        3   0.97
5      7       3        2   0.61
2     11       4        3   0.93
4     12       5        1   0.78
0     16       6        2   0.99
  • Related