I am trying to do a manual MLE estimation using scipy. My dataset is not that large so it surprises me that my values get very large very fast and scipy.optimize.minimize seems to get into NaNs for my density extremely quickly. I've tried to use the sum of logarithms instead of the product of the densities but that did not make things better at all.
This is my data:
[
1.000, 1.000, 1.000, 1.004, 1.005, 1.008, 1.014, 1.015, 1.023, 1.035, 1.038,
1.046, 1.048, 1.050, 1.050, 1.052, 1.052, 1.057, 1.063, 1.070, 1.070, 1.076,
1.087, 1.090, 1.091, 1.096, 1.101, 1.102, 1.113, 1.114, 1.120, 1.130, 1.131,
1.150, 1.152, 1.154, 1.155, 1.162, 1.170, 1.177, 1.189, 1.191, 1.193, 1.200,
1.200, 1.200, 1.200, 1.205, 1.210, 1.218, 1.238, 1.238, 1.241, 1.250, 1.250,
1.256, 1.257, 1.272, 1.278, 1.289, 1.299, 1.300, 1.316, 1.331, 1.349, 1.374,
1.378, 1.382, 1.396, 1.426, 1.429, 1.439, 1.443, 1.446, 1.473, 1.475, 1.478,
1.499, 1.506, 1.559, 1.568, 1.594, 1.609, 1.626, 1.649, 1.650, 1.669, 1.675,
1.687, 1.715, 1.720, 1.735, 1.750, 1.755, 1.787, 1.797, 1.805, 1.898, 1.908,
1.940, 1.989, 2.010, 2.012, 2.024, 2.047, 2.081, 2.085, 2.097, 2.136, 2.178,
2.181, 2.193, 2.200, 2.220, 2.301, 2.354, 2.359, 2.382, 2.409, 2.418, 2.430,
2.477, 2.500, 2.534, 2.572, 2.588, 2.591, 2.599, 2.660, 2.700, 2.700, 2.744,
2.845, 2.911, 2.952, 3.006, 3.021, 3.048, 3.059, 3.092, 3.152, 3.276, 3.289,
3.440, 3.447, 3.498, 3.705, 3.870, 3.896, 3.969, 4.000, 4.009, 4.196, 4.202,
4.311, 4.467, 4.490, 4.601, 4.697, 5.100, 5.120, 5.136, 5.141, 5.165, 5.260,
5.329, 5.778, 5.794, 6.285, 6.460, 6.917, 7.295, 7.701, 8.032, 8.142, 8.864,
9.263, 9.359, 10.801, 11.037, 11.504, 11.933, 11.998, 12.000, 14.153, 15.000,
15.398, 19.793, 23.150, 27.769, 28.288, 34.325, 42.691, 62.037, 77.839
]
How can I perform the MLE algorithm without running into overlow issues?
In case you are wondering, I am trying to fit the function f(x) = alpha * 500000^(alpha) / x^(alpha 1)
. So what I have is
import numpy as np
from scipy.optimize import minimize
# data = the given dataset
log_pareto_pdf = lambda alpha, x: np.log(alpha * (5e5 ** alpha) / (x ** (alpha 1)))
ret = minimize(lambda alpha: -np.sum([log_pareto_pdf(alpha, x) for x in data]), x0=np.array([1]))
CodePudding user response:
You need to avoid the giant exponentiations. One way to do this is to actually simplify your function:
log_pareto_pdf = lambda alpha, x: np.log(alpha) alpha*np.log(5e5) - (alpha 1)*np.log(x)
Without simplifying, your program still needs to try to calculate the 5e5**alpha
term, which will get really big really fast (overflow around 55).
You'll also need to supply the bounds
argument to minimize
to prevent any negatives inside the log
's.