Home > Software engineering >  Constrained Multi-Linear Regression using Gekko
Constrained Multi-Linear Regression using Gekko

Time:04-25

I have a multilinear regression problem where I have the prior information about the range of the output (dependent variable y) - The prediction must always lie in that range.

I want to find the coefficients (upper and lower bound) of each feature (independent variables) in order to make the linear regression model restricted to the desired range of output.

For example, I want to apply this solution on sklearn.datasets.load_boston where I know that the price of the house will be in the range of [10, 40] (the actual min and max values of y are [5,50]).

In the following example, my objective is 10 < yp < 40 and based on this I want to find the min and max bound of all the coefficients

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
from gekko import GEKKO

# Loading the dataset
x, y = load_boston(return_X_y=True)

c  = m.Array(m.FV, x.shape[1] 1) #array of parameters and intercept

for ci in c:
    ci.STATUS = 1 #calculate fixed parameter
    ci.LOWER  = 1 #constraint: lower limit
    ci.UPPER  = 0 #constraint: lower limit

#Define variables
xd = m.Array(m.Param,x.shape[1])
for i in range(x.shape[1]):
    xd[i].value = x[i]
yd = m.Param(y); yp = m.Var()

#Equation of Linear Functions
y_pred = c[0]*xd[0] \
         c[1]*xd[1] \
         c[2]*xd[2] \
         c[3]*xd[3] \
         c[4]*xd[4] \
         c[5]*xd[5] \
         c[6]*xd[6] \
         c[7]*xd[7] \
         c[8]*xd[8] \
         c[9]*xd[9] \
         c[10]*xd[10] \
         c[11]*xd[11] \
         c[12]*xd[12] \
         c[13]

#Inequality Constraints
yp = m.Var(lb=5,ub=40)
m.Equation(yp==y_pred)

#Minimize difference between actual and predicted y
m.Minimize((yd-yp)**2)

#Solve
m.solve(disp=True)

#Retrieve parameter values
a = [i.value[0] for i in c]

print(a) 

The listed code either gives me a solution not found error. Can you point out what I'm doing wrong here, or am I missing something important? Also, how can I get both the Upper and Lower bound of c for all independent variables?

CodePudding user response:

Try IMODE=2 for regression mode. There are a few modifications such as x[:,i] to load the data and ci.LOWER=0 as the lower bounds and ci.UPPER=1 as the upper bound.

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
from gekko import GEKKO
m = GEKKO(remote=False)
# Loading the dataset
x, y = load_boston(return_X_y=True)
n = x.shape[1]
c  = m.Array(m.FV, n 1) #array of parameters and intercept
for ci in c:
    ci.STATUS = 1 #calculate fixed parameter
    ci.LOWER  = 0 #constraint: lower limit
    ci.UPPER  = 1 #constraint: lower limit
#Load data
xd = m.Array(m.Param,n)
for i in range(n):
    xd[i].value = x[:,i]
yd = m.Param(y)
#Equation of Linear Functions
yp = m.Var(lb=5,ub=40)
m.Equation(yp==m.sum([c[i]*xd[i] \
               for i in range(n)])   c[13])
#Minimize difference between actual and predicted y
m.Minimize((yd-yp)**2)
#Regression mode
m.options.IMODE=2
#APOPT solver
m.options.SOLVER = 1
#Solve
m.solve(disp=True)
#Retrieve parameter values
a = [i.value[0] for i in c]
print(a) 

The solution to this constrained problem is obtained with the IPOPT or APOPT (slightly faster).

 Number of state variables:    7604
 Number of total equations: -  7590
 Number of slack variables: -  0
 ---------------------------------------
 Degrees of freedom       :    14
 
 ----------------------------------------------
 Model Parameter Estimation with APOPT Solver
 ----------------------------------------------
 
 Iter    Objective  Convergence
    0  5.60363E 04  1.25000E 00
    1  1.87753E 05  6.66134E-16
    2  3.06630E 04  9.99201E-16
    3  3.06630E 04  3.84187E-16
    5  3.06630E 04  1.35308E-16
 Successful solution
 
 ---------------------------------------------------
 Solver         :  APOPT (v1.0)
 Solution time  :  0.48429999999999995 sec
 Objective      :  30662.976306748842
 Successful solution
 ---------------------------------------------------
 

[0.0, 0.11336851243, 0.0, 1.0, 0.43491543768, 
 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.037613878067,
 0.0, 1.0]
  • Related