Home > Enterprise >  Optimizing objective function in SciPy optimize.minimize
Optimizing objective function in SciPy optimize.minimize

Time:03-09

res_M = minimize(L_M, x0=x_M, args=(data, w_vector),
                 method='L-BFGS-B', bounds=[(0.001, 1), (0.001, 1), (0.001, 1)])

def L_M(x, data, w_vector):    
    sum = 0
    for i in range(len(data)):
        sum  = w_vector[i]*(data[i][0]*np.log(x[0]) data[i][1]*np.log(x[1]) data[i][2]*np.log(x[2]))
    return -1*sum

As part of an Expectation-Maximization(EM) algorithm I am calling SciPy's optimize.minimize function in the M-step. x_M are three values between 0 and 1, initially all 0.5. The w_vectors are calculated in the E-Step, and consist of a NumPy 1D array of the lengths of the data set with floats in the range 0 and 1. Each line in the data set is three integer feature values between 0 and 3, for example [1 0 2].

The for loop in the objective function is slowing things down. I want to optimize it using vectorized calculations instead. I have tried the following, but it changes the result:

def L_M(x, data, w_vector):
        length = len(data)        
        a_i = data[np.arange(length)][0].sum()
        f_i = data[np.arange(length)][1].sum()
        l_i = data[np.arange(length)][2].sum()
        sum = (w_vector[np.arange(length)].sum())*(a_i*np.log(x[0]) f_i *np.log(x[1]) l_i*np.log(x[2]))
        return -1*sum

The minimize function is getting called many times and I hope to test it on some very large data sets so any ideas on how to rewrite it would be much appreciated.

CodePudding user response:

You should convert all your arrays into NumPy arrays and then this can be achieved as follows:

import numpy as np

data = np.array([[1, 0, 2], [2, 1, 0]])
w_vector = np.array([0, 1]

def L_M(x : np.ndarray, data : np.ndarray, w_vector : np.ndarray):    
    result = np.sum(w_vector * np.sum(data*np.log(x), axis = 1))
    return -result

This part of the code ((data[i][0]*np.log(x[0]) data[i][1]*np.log(x[1]) data[i][2]*np.log(x[2]))), where you multiply each element of data at ith position with log of each element of x and take the sum of all three, is replaced by np.sum(data*np.log(x), axis = 1) where element-wise multiplication is achieved (as these are np.array) and the sum is taken row-wise and the sum of each row is returned inside a 1D-array.

Afterward, this array is multiplied by w_vector (as these both have the same length and are np.array, element-wise multiplication is possible).

Finally, the sum of the resulting array is taken and saved into result. For optimization, pass x_M also as a NumPy array:

from scipy.optimize import minimize

x_M = np.array([0.5, 0.5, 0.5])

res_M = minimize(L_M, x0=x_M, args=(data, w_vector),
                 method='L-BFG-B', bounds=[(0.001, 1), (0.001, 1), (0.001, 1)])

P.S.: Avoid using variable names like sum as it is already a Python function and not a good practice IMHO.

  • Related