Let's assume i have 100 different kinds of items, each item got a name and a physical weight. I know the names of all 100 items but only the weight of 80 items.
When i ship items, i pack them in groups of 10 and sum the weight of these items. Due to some items are missing their weight, this will give an inaccurate sum when im about to ship.
I have different shipments with missing weights
Shipment 1
Item Name | Item Weight |
---|---|
Item 2 | 10 |
Item 27 | 20 |
Item 42 | 20 |
Item 71 | - |
Item 77 | - |
Total weight: 75
Shipment 2
Item Name | Item Weight |
---|---|
Item 2 | 10 |
Item 27 | 20 |
Item 42 | 20 |
Item 71 | - |
Item 92 | - |
Total weight: 90
Shipment 3
Item Name | Item Weight |
---|---|
Item 2 | 10 |
Item 27 | 20 |
Item 42 | 20 |
Item 55 | 35 |
Item 77 | - |
Total weight: 100
Since some of the shipments share the same items with missing weights and i have the shipments total weight, is there a way with machine learning to determine the weight of these items without im unpacking the entire shipment? Or would it just be a, in this case, 100x3 Matrix with a lot of empty values?
At this point im not really sure if i should use some type of regression to solve this or if its just a matrix, that would expand a lot if i had n more items to ship. I also wondered if this was some type of knapsack problem, but i hope anyone can guide my in the right direction.
CodePudding user response:
Forget about machine learning. This is a simple system of linear equations.
w_71 w_77 = 25
w_71 w_92 = 40
w_77 = 15
You can solve it with sympy.solvers.solveset.linsolve, or scipy.optimize.linprog, or scipy.linalg.lstsq, or numpy.linalg.lstsq
sympy.linsolve
is maybe the easiest to understand if you are not familiar with matrices; however, if the system is underdetermined, then instead of returning a particular solution to the system,sympy.linsolve
will return the general solution in parametric form.scipy.lstsq
ornumpy.lstsq
expect the problem to be given in matrix form. If there is more than one possible solution, they will return the most "average" solution. However, they cannot take any positivity constraint into account: they might return a solution where one of the variables is negative. You can maybe fix this behaviour by adding a new equation to the system to manually force a variable to be positive, then solve again.scipy.linprog
expects the problem to be given in matrix form; it also expects you to specify a linear objective function, to choose which particular solution is "best" in case there is more than one possible solution.linprog
also considers that all variables are nonnegative by default, or allows you to specify explicit bounds for the variables yourself. It also allows you to add inequality constraints, in addition to the equations, if you wish to.
Using sympy.solvers.solveset.linsolve
from sympy.solvers.solveset import linsolve
from sympy import symbols
w71, w77, w92 = symbols('w71 w77 w92')
eqs = [w71 w77-25, w71 w92-40, w77-15]
solution = linsolve(eqs, [w71, w77, w92])
# solution = {(10, 15, 30)}
In your example, there is only one possible solution, so linsolve
returned that solution: w71 = 10, w77 = 15, w92 = 30
.
However, in case there is more than one possible solution, linsolve
will return a parametric form for the general solution:
x,y,z = symbols('x y z')
eqs = [x y-10, y z-20]
solution = linsolve(eqs, [x, y, z])
# solution = {(z - 10, 20 - z, z)}
Here there is an infinity of possible solutions. linsolve
is telling us that we can pick any value for z
, and then we'll get the corresponding x
and y
as x = z - 10
and y = 20 - z
.
Using numpy.linalg.lstsq
lstsq
expects the system of equations to be given in matrix form. If there is more than one possible solution, then it will return the most "average" solution. For instance, if the system of equation is simply x y = 10
, then lstsq
will return the particular solution x = 5, y = 5
and will ignore more "extreme" solutions such as x = 10, y = 0
.
from numpy.linalg import lstsq
# w_71 w_77 = 25
# w_71 w_92 = 40
# w_77 = 15
A = [[1, 1, 0], [1, 0, 1], [0, 1, 0]]
b = [25, 40, 15]
solution = lstsq(A, b)
solution[0]
# array([10., 15., 30.])
Here lstsq
found the unique solution, w71 = 10, w77=15, w92 = 30
.
# x y = 10
# y z = 20
A = [[1, 1, 0], [0, 1, 1]]
b = [10, 20]
solution = lstsq(A, B)
solution[0]
# array([-3.55271368e-15, 1.00000000e 01, 1.00000000e 01])
Here lstsq
had to choose a particular solution, and chose the one it considered most "average", x = 0, y = 10, z = 10
. You might want to round the solution to integers.
One drawback of lstsq
is that it doesn't take into account your non-negativity constraint. That is, it might return a solution where one of the variables is negative:
# x y = 2
# y z = 20
A = [[1, 1, 0], [0, 1, 1])
b = [2, 20]
solution = lstsq(A, b)
solution[0]
# array([-5.33333333, 7.33333333, 12.66666667])
See how lstsq
ignored the possible positive solution x = 1, y = 1, z = 18
and instead returned the solution it considered most "average", x = -5.33, y = 7.33, z = 12.67
.
One way to fix this is to add an equation yourself to force the offending variable to be positive. For instance, here we noticed that lstsq
wanted x
to be negative, so we can manually force x
to be equal to 1
instead, and solve again:
# x y = 2
# y z = 20
# x = 1
A = [[1, 1, 0], [0, 1, 1], [1, 0, 0]]
b = [2, 20, 1]
solution = lstsq(A, b)
solution[0]
# array([ 1., 1., 19.])
Now that we manually forced x
to be 1, lstsq
found solution x=1, y=1, z=19
which we're more happy with.
Using scipy.optimize.linprog
The particularity of linprog
is that it expects you to specify the "objective" used to choose a particular solution, in case there is more than one possible solution.
Also, linprog
allows you to specify bounds for the variables. The default is that all variables are nonnegative, which is what you want.
from scipy.optimize import linprog
# w_71 w_77 = 25
# w_71 w_92 = 40
# w_77 = 15
A = [[1, 1, 0], [1, 0, 1], [0, 1, 0]]
b = [25, 40, 15]
c = [1, 1, 1] # coefficients for objective: minimise w71 w77 w92.
solution = linprog(c, A_eq = A, b_eq = b)
solution.x
# array([10., 15., 30.])