Home > Back-end >  Replace np.nans in list with calculated values obtained from polynomial regression
Replace np.nans in list with calculated values obtained from polynomial regression

Time:04-09

I have two lists of y values:

y_list1 = [45,np.nan,np.nan,np.nan, 40,50,6,2,7,np.nan, np.nan,np.nan, np.nan, np.nan]

y_list2 = [4,23,np.nan, np.nan, np.nan, np.nan, np.nan,5, np.nan, np.nan, np.nan, np.nan, np.nan]

and both of these values were obtained at a set of time points:

x = np.array([0,3,4,5,6,7,8,9,10,11,12,13,14,15])

The aim: Return y_list1 and y_list2 with the np.nans replaced with values, by fitting a polynomial regression to the data that is there, and then calculating the missing points.

I am able to fit the polynomial:

import sys
import numpy as np

x = np.array([0,3,4,5,6,7,8,9,10,11,12,13,14,15])

id_list = ['1','2']
list_y = np.array([[45,np.nan,np.nan,np.nan, 40,50,6,2,7,np.nan, np.nan,np.nan, np.nan, np.nan],[4,23,np.nan, np.nan, np.nan, np.nan, np.nan,5, np.nan, np.nan, np.nan, np.nan, np.nan]]

for each_id,y in zip(id_list,list_y):

        #treat the missing data
        idx = np.isfinite(x) & np.isfinite(y)

        #fit
        ab = np.polyfit(x[idx], y[idx], len(list_y[0])) 

So then I wanted to use this fit to replace the missing values in y, so I found this, and implemented:

         replace_nan = np.polyval(x,y)
         print(replace_nan)

The output is:

[2.13161598e 20            nan            nan            nan
 5.20634185e 19 7.52453405e 20 8.35884417e 09 3.27510000e 04
 5.11358666e 10            nan            nan            nan
            nan            nan]
test_polyreg.py:16: RankWarning: Polyfit may be poorly conditioned
  ab = np.polyfit(x[idx], y[idx], len(list_y[0])) #understand how many degrees
[7.45653990e 07 6.97736286e 16            nan            nan
            nan            nan            nan 9.91821285e 08
            nan            nan            nan            nan
            nan            nan]

I'm not concerned about the poor conditioning warning because this is just test data to try understand how it should work, but the output still has nans in it (and didn't use the fit I'd previously generated), could someone should be how to replace the nans in the y values with points estimated from a polynomial regression?

CodePudding user response:

first you should modify the ab definition as:

ab = np.polyfit(x[idx], np.array(y)[idx], idx.sum())

ab are your polynomial coefficients, so you have to pass them to np.polyval as:

replace_nan = np.polyval(ab,x)
print(replace_nan)

out:

[   4.           23.           26.54413638   28.01419869   27.00250156
   23.10135965   15.90308758    5.          -10.01558845  -29.55136312
  -54.01500938  -83.81421259 -119.3566581  -161.05003127]
  • Related