Pass array of classes/lists/indices as input argument for scipy.optimize.curve

I am using curve_fit from scipy.optimize to fit some parameters of one equation. I find myself with several arrays of Xs and Ys training data samples and also arrays of conditions for each pair (X,Y) which are also parameters that are given to the equation (and are not equal in general). The equation is something like:

Y[i] = Equation(X[i], *C[i], *K)

with:

X[i] a list of x-values (n lists in total)
Y[i] a list of y-values (n lists in total)
C[i] given parameters (n lists in total)
K the parameters to fit

If I only had one array of each type a lambda function would be enough, but that's not the case. The one idea I came up with is somehow using np.concatenate to join the arrays in just one of each kind (X, Y and C), but I find myself unable to pass it properly so that the function can work it out.

I tried several ways to perform this. One approach I came up with is by creating a class with both the X data and the conditions. As an example, it was something like this:

import numpy as np
import scipy.optimize as opt

class MyClass:
    def __init__(self, a1, a2, a3):
        self.A1 = a1
        self.A2 = a2
        self.A3 = a3

f = lambda x, b, c: b*x.A1   c*x.A2   x.A3

X = np.linspace(0,10,20)

MyClass_array = np.array([MyClass(element,1,2) for element in X])

Y = X   2

opt.curve_fit(f, MyClass_array, Y)

Which gives me the following output:

TypeError: float() argument must be a string or a number, not 'MyClass'

I tried using lists in a similar way to this code:

import numpy as np
import scipy.optimize as opt

f = lambda x, b, c: b*x[0]   c*x[1]   x[2]

X = np.array([[element, 1, 2] for element in np.linspace(0,10,20)])
Y = 2   np.linspace(0,10,20)

opt.curve_fit(f, X, Y)

Again, there is a mistake since apparently both arrays need to have the same shape, and returns:

ValueError: operands could not be broadcast together with shapes (3,) (20,)

Lastly, I tried to create an array with two lists, both with the same shape so that one would be the X and the other the positions of the conditions which would be stored on another list in a similar way to this code:

import numpy as np
import scipy.optimize as opt

Aux = [[1,2],[1,2]]

f = lambda x, b, c: b*x[0]   c*Aux[np.int(x[1])][0]   Aux[np.int(x[1])][0]

x1 = np.linspace(0,10,20)
x2 = np.zeros(20).astype('int')
X = np.array([x1, x2])

Y = 2   x1

opt.curve_fit(f, X, Y)

But then again, it raises:

TypeError: only size-1 arrays can be converted to Python scalars

since I can't use the array as an index. Is there any way I can make the x2 array values go sequentially as an index as the x1's are? (although I know x1 is not working as an index)

Is there anything I can do in any of these scenarios to make it work?

CodePudding user response：

This is more of a comment(s) than answer, but it will be too long, and probably will be edited beyond the 5 minute limit.

First - when talking about errors, show the full traceback; you/we need to know exactly where the error occurs. For example is the error in the curve_fit itself, or in fn, or in some conversion step before hand?

Second - make sure you understand what curve_fit expects - from the function as well as the arrays. I won't review the docs (right now), but most likely it expects number arrays, 1 or 2d. Object dtype arrays, or arrays of lists or custom class objects won't work.

At a quick glance it looks like you are trying a bunch of different things without really understanding either of the above. Debugging by trying random things does not work.

If my memory is correct the fn should be something that works like fn(X, b, c), and the result should be comparable to Y. curve_fit will pass your X to it, along with trial values of b,c, and compare the result with Y. It's a good idea to do a trial calculation of your own, e.g.

fn(X,1,1)

and check the shape and dtype, and make sure you can subtract Y from it.

Often it helps to include a print(X) or print(X.shape) in fn so you have a clear(er) idea of how curve_fit calling it.

I see from the curve_fit source code, that X must be float, or convertible to float:

xdata = np.asarray(xdata, float)    # or actually
np.asarray_chkfinite(xdata, float)  # same thing but a little more checking

Y must also be float.

edit

With your first block of code, I've added a method the the class definition.

def __repr__(self):
    return f'MyClass <{self.A1}, {self.A2}, {self.A3}>'

So the array displays more usefully:

In [5]: MyClass_array
Out[5]: 
array([MyClass <0.0, 1, 2>, MyClass <0.5263157894736842, 1, 2>,
       MyClass <1.0526315789473684, 1, 2>,
       MyClass <1.5789473684210527, 1, 2>,
       ...
       MyClass <9.473684210526315, 1, 2>, MyClass <10.0, 1, 2>],
      dtype=object)

Then when I try the curve_fit:

In [6]: opt.curve_fit(f, MyClass_array, Y)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 opt.curve_fit(f, MyClass_array, Y)

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:790, in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, full_output, **kwargs)
    786 if isinstance(xdata, (list, tuple, np.ndarray)):
    787     # `xdata` is passed straight to the user-defined `f`, so allow
    788     # non-array_like `xdata`.
    789     if check_finite:
--> 790         xdata = np.asarray_chkfinite(xdata, float)
    791     else:
    792         xdata = np.asarray(xdata, float)

File ~\anaconda3\lib\site-packages\numpy\lib\function_base.py:486, in asarray_chkfinite(a, dtype, order)
    422 @set_module('numpy')
    423 def asarray_chkfinite(a, dtype=None, order=None):
    424     """Convert the input to an array, checking for NaNs or Infs.
    425 
    426     Parameters
   (...)
    484 
    485     """
--> 486     a = asarray(a, dtype=dtype, order=order)
    487     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
    488         raise ValueError(
    489             "array must not contain infs or NaNs")
TypeError: float() argument must be a string or a number, not 'MyClass'

This is what I mean by the full error message. What I see is that it is trying to make a float dtype array (via asarray_chkfinite and asarray). astype produces the same error:

In [7]: MyClass_array.astype(float)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 MyClass_array.astype(float)

TypeError: float() argument must be a string or a number, not 'MyClass'

But what if curve_fit left your array as is, and passed it to the f function:

In [8]: f(MyClass_array,1,2)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 f(MyClass_array,1,2)

Cell In[4], line 11, in <lambda>(x, b, c)
      8     def __repr__(self):
      9         return f'MyClass <{self.A1}, {self.A2}, {self.A3}>'
---> 11 f = lambda x, b, c: b*x.A1   c*x.A2   x.A3
     13 X = np.linspace(0,10,20)
     15 MyClass_array = np.array([MyClass(element,1,2) for element in X])

AttributeError: 'numpy.ndarray' object has no attribute 'A1'

You wrote f to work with one MyClass object, not with a whole array of them:

In [9]: f(MyClass_array[1],1,2)
Out[9]: 4.526315789473684

2nd try

In [10]: f = lambda x, b, c: b*x[0]   c*x[1]   x[2]
    ...: 
    ...: X = np.array([[element, 1, 2] for element in np.linspace(0,10,20)])
    ...: Y = 2   np.linspace(0,10,20)
    ...: 
    ...: opt.curve_fit(f, X, Y)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 6
      3 X = np.array([[element, 1, 2] for element in np.linspace(0,10,20)])
      4 Y = 2   np.linspace(0,10,20)
----> 6 opt.curve_fit(f, X, Y)

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:834, in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, full_output, **kwargs)
    831 if ydata.size != 1 and n > ydata.size:
    832     raise TypeError(f"The number of func parameters={n} must not"
    833                     f" exceed the number of data points={ydata.size}")
--> 834 res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
    835 popt, pcov, infodict, errmsg, ier = res
    836 ysize = len(infodict['fvec'])

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:410, in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
    408 if not isinstance(args, tuple):
    409     args = (args,)
--> 410 shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
    411 m = shape[0]
    413 if n > m:

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:24, in _check_func(checker, argname, thefunc, x0, args, numinputs, output_shape)
     22 def _check_func(checker, argname, thefunc, x0, args, numinputs,
     23                 output_shape=None):
---> 24     res = atleast_1d(thefunc(*((x0[:numinputs],)   args)))
     25     if (output_shape is not None) and (shape(res) != output_shape):
     26         if (output_shape[0] != 1):

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:485, in _wrap_func.<locals>.func_wrapped(params)
    484 def func_wrapped(params):
--> 485     return func(xdata, *params) - ydata

ValueError: operands could not be broadcast together with shapes (3,) (20,)

Now X is a 2d array of floats; Y is 1d of floats.

In [11]: X.shape, X.dtype
Out[11]: ((20, 3), dtype('float64'))

In [12]: Y.shape, Y.dtype
Out[12]: ((20,), dtype('float64'))

With this f, the result (given the (20,3) X array, is a (3,) array:

In [13]: f(X,1,2)
Out[13]: array([2.10526316, 4.        , 8.        ])

The error comes when it tries to compare the result of the func call with the ydata

func(xdata, *params) - ydata

It can't subtract a (20,) from a (3,) array.

The curve_fit docs clearly state that it expects the func to behave like:

ydata = f(xdata, *params)   eps

Your f = lambda x, b, c: b*x[0] c*x[1] x[2], when given a (20,3) shape x, computes something from the first 3 rows of x.

In other words it is is:

f = lambda x, b, c: b*x[0,:]   c*x[1,:]   x[2,:]

Did you want instead

f = lambda x, b, c: b*x[:,0]   c*x[:,1]   x[:,2]

That f(X, 1,2) should give a (20,) result, which can be subtracted with Y.

3rd try

In [15]: X
Out[15]: 
array([[ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
         2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
         5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
         7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

In [16]: X.shape, X.dtype
Out[16]: ((2, 20), dtype('float64'))

I could show the full curve_fit traceback, but let's check directly how f works with the X. First I get a DeprecationWarning because you use np.int

In [20]: f(X,1,2)
C:\Users\paul\AppData\Local\Temp\ipykernel_4556\3065079927.py:3: DeprecationWarning: `np.int` ...
  f = lambda x, b, c: b*x[0]   c*Aux[np.int(x[1])][0]   Aux[np.int(x[1])][0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 f(X,1,2)

Cell In[14], line 3, in <lambda>(x, b, c)
      1 Aux = [[1,2],[1,2]]
----> 3 f = lambda x, b, c: b*x[0]   c*Aux[np.int(x[1])][0]   Aux[np.int(x[1])][0]
      5 x1 = np.linspace(0,10,20)
      6 x2 = np.zeros(20).astype('int')

TypeError: only size-1 arrays can be converted to Python scalars

As with the first case, you are assuming the curve_fit passes just one "sample" to the f; where as it really passes the whole array. X[1] is an array of 20 values.

In [21]: int(X[1])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 int(X[1])

TypeError: only size-1 arrays can be converted to Python scalars

The common factor in all these errors is that you did not verify that your f works with the X. You seem to be under the impression that curve_fit passes on element or row of X at a time.

2nd try revisited

The docs are bit unclear whether X has to be (20,), or can it be (20,3). Lets define f to work with a 3 column X:

In [24]: f = lambda x, b, c: b*x[:,0]   c*x[:,1]   x[:,2]

Then with a trial call f produces a (20,) array which can be tested against Y:

In [25]: f(X, 1, 2)
Out[25]: 
array([ 4.        ,  4.52631579,  5.05263158,  5.57894737,  6.10526316,...
       11.89473684, 12.42105263, 12.94736842, 13.47368421, 14.        ])

In [26]: f(X, 1, 2)-Y
Out[26]: 
array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2.])

And if we do the curve_fits:

In [27]: opt.curve_fit(f, X, Y)
Out[27]: 
(array([ 1.00000000e 00, -2.18110054e-12]),
 array([[ 2.11191137e-27, -2.44049320e-31],
        [-2.44049320e-31,  1.93496881e-33]]))

It runs and gives a result. Testing the 2 parameters it found:

In [28]: f(X, 1, 0)
Out[28]: 
array([ 2.        ,  2.52631579,  3.05263158,  3.57894737,  4.10526316,...
        9.89473684, 10.42105263, 10.94736842, 11.47368421, 12.        ])

In [29]: f(X, 1, 0)-Y
Out[29]: 
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])

This is kind of null case since your Y doesn't have any noise. It's just a function of np.linspace(0,10,20), the same as X. But it does show that curve_fit works with a (n,3) X, provide the f is correct.