I am using curve_fit
from scipy.optimize
to fit some parameters of one equation. I find myself with several arrays of Xs and Ys training data samples and also arrays of conditions for each pair (X,Y) which are also parameters that are given to the equation (and are not equal in general). The equation is something like:
Y[i] = Equation(X[i], *C[i], *K)
with:
- X[i] a list of x-values (n lists in total)
- Y[i] a list of y-values (n lists in total)
- C[i] given parameters (n lists in total)
- K the parameters to fit
If I only had one array of each type a lambda function would be enough, but that's not the case. The one idea I came up with is somehow using np.concatenate
to join the arrays in just one of each kind (X, Y and C), but I find myself unable to pass it properly so that the function can work it out.
I tried several ways to perform this. One approach I came up with is by creating a class with both the X data and the conditions. As an example, it was something like this:
import numpy as np
import scipy.optimize as opt
class MyClass:
def __init__(self, a1, a2, a3):
self.A1 = a1
self.A2 = a2
self.A3 = a3
f = lambda x, b, c: b*x.A1 c*x.A2 x.A3
X = np.linspace(0,10,20)
MyClass_array = np.array([MyClass(element,1,2) for element in X])
Y = X 2
opt.curve_fit(f, MyClass_array, Y)
Which gives me the following output:
TypeError: float() argument must be a string or a number, not 'MyClass'
I tried using lists in a similar way to this code:
import numpy as np
import scipy.optimize as opt
f = lambda x, b, c: b*x[0] c*x[1] x[2]
X = np.array([[element, 1, 2] for element in np.linspace(0,10,20)])
Y = 2 np.linspace(0,10,20)
opt.curve_fit(f, X, Y)
Again, there is a mistake since apparently both arrays need to have the same shape, and returns:
ValueError: operands could not be broadcast together with shapes (3,) (20,)
Lastly, I tried to create an array with two lists, both with the same shape so that one would be the X and the other the positions of the conditions which would be stored on another list in a similar way to this code:
import numpy as np
import scipy.optimize as opt
Aux = [[1,2],[1,2]]
f = lambda x, b, c: b*x[0] c*Aux[np.int(x[1])][0] Aux[np.int(x[1])][0]
x1 = np.linspace(0,10,20)
x2 = np.zeros(20).astype('int')
X = np.array([x1, x2])
Y = 2 x1
opt.curve_fit(f, X, Y)
But then again, it raises:
TypeError: only size-1 arrays can be converted to Python scalars
since I can't use the array as an index. Is there any way I can make the x2
array values go sequentially as an index as the x1
's are? (although I know x1
is not working as an index)
Is there anything I can do in any of these scenarios to make it work?
CodePudding user response:
This is more of a comment(s) than answer, but it will be too long, and probably will be edited beyond the 5 minute limit.
First - when talking about errors, show the full traceback; you/we need to know exactly where the error occurs. For example is the error in the curve_fit
itself, or in fn
, or in some conversion step before hand?
Second - make sure you understand what curve_fit
expects - from the function as well as the arrays. I won't review the docs (right now), but most likely it expects number arrays, 1 or 2d. Object dtype arrays, or arrays of lists or custom class objects won't work.
At a quick glance it looks like you are trying a bunch of different things without really understanding either of the above. Debugging by trying random things does not work.
If my memory is correct the fn
should be something that works like fn(X, b, c)
, and the result should be comparable to Y
. curve_fit
will pass your X
to it, along with trial values of b,c
, and compare the result with Y
. It's a good idea to do a trial calculation of your own, e.g.
fn(X,1,1)
and check the shape and dtype, and make sure you can subtract Y
from it.
Often it helps to include a print(X)
or print(X.shape)
in fn
so you have a clear(er) idea of how curve_fit
calling it.
I see from the curve_fit
source code, that X
must be float, or convertible to float:
xdata = np.asarray(xdata, float) # or actually
np.asarray_chkfinite(xdata, float) # same thing but a little more checking
Y
must also be float
.
edit
With your first block of code, I've added a method the the class definition.
def __repr__(self):
return f'MyClass <{self.A1}, {self.A2}, {self.A3}>'
So the array displays more usefully:
In [5]: MyClass_array
Out[5]:
array([MyClass <0.0, 1, 2>, MyClass <0.5263157894736842, 1, 2>,
MyClass <1.0526315789473684, 1, 2>,
MyClass <1.5789473684210527, 1, 2>,
...
MyClass <9.473684210526315, 1, 2>, MyClass <10.0, 1, 2>],
dtype=object)
Then when I try the curve_fit:
In [6]: opt.curve_fit(f, MyClass_array, Y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 opt.curve_fit(f, MyClass_array, Y)
File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:790, in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, full_output, **kwargs)
786 if isinstance(xdata, (list, tuple, np.ndarray)):
787 # `xdata` is passed straight to the user-defined `f`, so allow
788 # non-array_like `xdata`.
789 if check_finite:
--> 790 xdata = np.asarray_chkfinite(xdata, float)
791 else:
792 xdata = np.asarray(xdata, float)
File ~\anaconda3\lib\site-packages\numpy\lib\function_base.py:486, in asarray_chkfinite(a, dtype, order)
422 @set_module('numpy')
423 def asarray_chkfinite(a, dtype=None, order=None):
424 """Convert the input to an array, checking for NaNs or Infs.
425
426 Parameters
(...)
484
485 """
--> 486 a = asarray(a, dtype=dtype, order=order)
487 if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
488 raise ValueError(
489 "array must not contain infs or NaNs")
TypeError: float() argument must be a string or a number, not 'MyClass'
This is what I mean by the full error message. What I see is that it is trying to make a float dtype array (via asarray_chkfinite
and asarray
). astype
produces the same error:
In [7]: MyClass_array.astype(float)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 MyClass_array.astype(float)
TypeError: float() argument must be a string or a number, not 'MyClass'
But what if curve_fit
left your array as is, and passed it to the f
function:
In [8]: f(MyClass_array,1,2)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 f(MyClass_array,1,2)
Cell In[4], line 11, in <lambda>(x, b, c)
8 def __repr__(self):
9 return f'MyClass <{self.A1}, {self.A2}, {self.A3}>'
---> 11 f = lambda x, b, c: b*x.A1 c*x.A2 x.A3
13 X = np.linspace(0,10,20)
15 MyClass_array = np.array([MyClass(element,1,2) for element in X])
AttributeError: 'numpy.ndarray' object has no attribute 'A1'
You wrote f
to work with one MyClass
object, not with a whole array of them:
In [9]: f(MyClass_array[1],1,2)
Out[9]: 4.526315789473684
2nd try
In [10]: f = lambda x, b, c: b*x[0] c*x[1] x[2]
...:
...: X = np.array([[element, 1, 2] for element in np.linspace(0,10,20)])
...: Y = 2 np.linspace(0,10,20)
...:
...: opt.curve_fit(f, X, Y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[10], line 6
3 X = np.array([[element, 1, 2] for element in np.linspace(0,10,20)])
4 Y = 2 np.linspace(0,10,20)
----> 6 opt.curve_fit(f, X, Y)
File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:834, in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, full_output, **kwargs)
831 if ydata.size != 1 and n > ydata.size:
832 raise TypeError(f"The number of func parameters={n} must not"
833 f" exceed the number of data points={ydata.size}")
--> 834 res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
835 popt, pcov, infodict, errmsg, ier = res
836 ysize = len(infodict['fvec'])
File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:410, in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
408 if not isinstance(args, tuple):
409 args = (args,)
--> 410 shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
411 m = shape[0]
413 if n > m:
File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:24, in _check_func(checker, argname, thefunc, x0, args, numinputs, output_shape)
22 def _check_func(checker, argname, thefunc, x0, args, numinputs,
23 output_shape=None):
---> 24 res = atleast_1d(thefunc(*((x0[:numinputs],) args)))
25 if (output_shape is not None) and (shape(res) != output_shape):
26 if (output_shape[0] != 1):
File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:485, in _wrap_func.<locals>.func_wrapped(params)
484 def func_wrapped(params):
--> 485 return func(xdata, *params) - ydata
ValueError: operands could not be broadcast together with shapes (3,) (20,)
Now X
is a 2d array of floats; Y
is 1d of floats.
In [11]: X.shape, X.dtype
Out[11]: ((20, 3), dtype('float64'))
In [12]: Y.shape, Y.dtype
Out[12]: ((20,), dtype('float64'))
With this f
, the result (given the (20,3) X
array, is a (3,) array:
In [13]: f(X,1,2)
Out[13]: array([2.10526316, 4. , 8. ])
The error comes when it tries to compare the result of the func
call with the ydata
func(xdata, *params) - ydata
It can't subtract a (20,) from a (3,) array.
The curve_fit
docs clearly state that it expects the func
to behave like:
ydata = f(xdata, *params) eps
Your f = lambda x, b, c: b*x[0] c*x[1] x[2]
, when given a (20,3) shape x
, computes something from the first 3 rows of x
.
In other words it is is:
f = lambda x, b, c: b*x[0,:] c*x[1,:] x[2,:]
Did you want instead
f = lambda x, b, c: b*x[:,0] c*x[:,1] x[:,2]
That f(X, 1,2)
should give a (20,) result, which can be subtracted with Y
.
3rd try
In [15]: X
Out[15]:
array([[ 0. , 0.52631579, 1.05263158, 1.57894737, 2.10526316,
2.63157895, 3.15789474, 3.68421053, 4.21052632, 4.73684211,
5.26315789, 5.78947368, 6.31578947, 6.84210526, 7.36842105,
7.89473684, 8.42105263, 8.94736842, 9.47368421, 10. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])
In [16]: X.shape, X.dtype
Out[16]: ((2, 20), dtype('float64'))
I could show the full curve_fit
traceback, but let's check directly how f
works with the X
. First I get a DeprecationWarning because you use np.int
In [20]: f(X,1,2)
C:\Users\paul\AppData\Local\Temp\ipykernel_4556\3065079927.py:3: DeprecationWarning: `np.int` ...
f = lambda x, b, c: b*x[0] c*Aux[np.int(x[1])][0] Aux[np.int(x[1])][0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[20], line 1
----> 1 f(X,1,2)
Cell In[14], line 3, in <lambda>(x, b, c)
1 Aux = [[1,2],[1,2]]
----> 3 f = lambda x, b, c: b*x[0] c*Aux[np.int(x[1])][0] Aux[np.int(x[1])][0]
5 x1 = np.linspace(0,10,20)
6 x2 = np.zeros(20).astype('int')
TypeError: only size-1 arrays can be converted to Python scalars
As with the first case, you are assuming the curve_fit
passes just one "sample" to the f
; where as it really passes the whole array. X[1]
is an array of 20 values.
In [21]: int(X[1])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[21], line 1
----> 1 int(X[1])
TypeError: only size-1 arrays can be converted to Python scalars
The common factor in all these errors is that you did not verify that your f
works with the X
. You seem to be under the impression that curve_fit
passes on element or row of X
at a time.
2nd try revisited
The docs are bit unclear whether X
has to be (20,), or can it be (20,3). Lets define f
to work with a 3 column X
:
In [24]: f = lambda x, b, c: b*x[:,0] c*x[:,1] x[:,2]
Then with a trial call f
produces a (20,) array which can be tested against Y
:
In [25]: f(X, 1, 2)
Out[25]:
array([ 4. , 4.52631579, 5.05263158, 5.57894737, 6.10526316,...
11.89473684, 12.42105263, 12.94736842, 13.47368421, 14. ])
In [26]: f(X, 1, 2)-Y
Out[26]:
array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
2., 2., 2.])
And if we do the curve_fits:
In [27]: opt.curve_fit(f, X, Y)
Out[27]:
(array([ 1.00000000e 00, -2.18110054e-12]),
array([[ 2.11191137e-27, -2.44049320e-31],
[-2.44049320e-31, 1.93496881e-33]]))
It runs and gives a result. Testing the 2 parameters it found:
In [28]: f(X, 1, 0)
Out[28]:
array([ 2. , 2.52631579, 3.05263158, 3.57894737, 4.10526316,...
9.89473684, 10.42105263, 10.94736842, 11.47368421, 12. ])
In [29]: f(X, 1, 0)-Y
Out[29]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.])
This is kind of null case since your Y
doesn't have any noise. It's just a function of np.linspace(0,10,20)
, the same as X
. But it does show that curve_fit
works with a (n,3) X
, provide the f
is correct.