Hope you can help me.
I have the following dataframe named df (see table below)
I have two sales reps - sales_rep_1 and sales_rep_2.
I need to assign either of them to a a letter in the table. A to Z
For each row I would like to assign either sales_rep_1 or sales_rep_2 to a new column - df['Rep']
The only condition is that the sum of each column: DM Bookings, Revenue and AtL Opps needs to be split as evenly as possible between each sales rep.
Letter | DM Bookings | Revenue | AtL Opps |
---|---|---|---|
A | 6.0 | 42506.0 | 34 |
B | 2.0 | 21055.0 | 41 |
C | 1.0 | 6307.0 | 36 |
D | 0.0 | 8254.0 | 14 |
E | 1.0 | 29878.0 | 38 |
F | 0.0 | 6911.0 | 10 |
G | 1.0 | 6735.0 | 19 |
H | 0.0 | 0.0 | 80 |
Is there a way to do this? The result does not need to be perfect, nor exact
Any help would be great.
Thanks in advance
CodePudding user response:
This is an optimization problem. So, does scipy have a ready-made solution for this? Close, but not quite - it has linear programming solvers which are very general but not that easy to use.
With thanks to SO answer Linear sum assignment (SciPy) and balancing the costs we can use a linear programming solver to set up our conditions and solve for best worker assignment with two workers. Here we just adapt that answer by @joni.
We'll need to define a task cost so that we can base the solution on an optimized distribution of task cost between reps or "workers".
import pulp
import numpy as np
workers = ["A", "B"]
n_workers = len(workers)
n_tasks = len(df)
columns = ["Revenue", "AtL Opps", "DM Bookings"]
Rescale and weigh columns - matrix multiply by [1, 1/3, 1/9] here to give decreasing weight to the columns in the order they were given.
task_weights = (df[columns] / df[columns].max(axis=0))
## combine column weights by multiplying them by [1, 1/3, 1/9] to de-prioritize later columns
task_weight = task_weights @ 3.**(-np.arange(len(columns)))
Then define the constrained optimization problem
# actual cost matrix - one row per worker
# in our case, cost is the same regardless of which rep handles it
c = np.stack([task_weight] * len(workers), axis=0)
# create the model
mdl = pulp.LpProblem("even_assignment")
# decision variables
x = {}
for w in workers:
for t in range(n_tasks):
x[w, t] = pulp.LpVariable(f"x[{w}, {t}]", cat="Binary")
max_val = pulp.LpVariable("max_val", cat="Continous")
min_val = pulp.LpVariable("min_val", cat="Continous")
# objective: minimize the difference between the maximum and the minimum
# costs per worker
mdl.setObjective(max_val - min_val)
# constraint: each task can only be assigned once
for task in range(n_tasks):
mdl.addConstraint(sum(x[w, task] for w in workers) == 1)
# constraint: evenly distribute the tasks
for i_w, w in enumerate(workers):
assignment_cost = sum(x[w, task] * c[i_w, task] for task in range(n_tasks))
mdl.addConstraint(assignment_cost <= max_val)
mdl.addConstraint(assignment_cost >= min_val)
# solve the problem
mdl.solve()
df_result = df.copy()
# Output
for i_w, w in enumerate(workers):
worker_cost = sum(x[w, t].varValue*c[i_w, t] for t in range(n_tasks))
print(f"scaled cost for rep {w}: {worker_cost:.2f}")
worker_task = [x[w, t].varValue for t in range(n_tasks)]
df_result[w] = worker_task
print()
df_result = (df_result.assign(Rep = lambda df: np.select([df[w] == 1 for w in workers], [w.upper() for w in workers], default=None))
.drop(columns=workers))
df_result
scaled cost for rep A: 2.10
scaled cost for rep B: 2.09
Letter DM Bookings Revenue AtL Opps Rep
0 A 6.0 42506.0 34 B
1 B 2.0 21055.0 41 A
2 C 1.0 6307.0 36 A
3 D 0.0 8254.0 14 B
4 E 1.0 29878.0 38 A
5 F 0.0 6911.0 10 A
6 G 1.0 6735.0 19 B
7 H 0.0 0.0 80 B
Evaluation time, how did we split?
pd.concat([df_result.groupby("Rep").size().rename("count"), df_result.groupby("Rep").sum()], axis=1)
count DM Bookings Revenue AtL Opps
Rep
A 4 4.0 64151.0 125
B 4 7.0 57495.0 147
Different weights per column will adjust the split differently. For example using [1, 0.1, 0.01] would have prioritized Revenue more.
If balancing the number of items assigned to each rep is also wanted, then add a synthetic column with all ones and give it an appropriate priority and weight.