Home > Net >  Numpy Array fill empty data to "uniformity"
Numpy Array fill empty data to "uniformity"

Time:07-29

I have a 2D numpy array with lack of data, and I want to fill them by giving a mathematical uniformity to the array. I got something like this :

[[72829],
 [nan],
 [73196],
 [73087],
 [nan],
 [nan],
 [72294.5]]

I want to fill those empy cells with the mean between the closest cells, with return with something like this :

[[72829],
 [73012.5],
 [73196],
 [73087],
 [72888.875],
 [72492.625],
 [72294.5]]

I tried to use SimpleImputer and KNNImputer from Scikit-learn, but all what I got is the same value to all data, not the mean between the cells as I mentioned before. Thats the code :

for label, column in data.iteritems():
            reshaped = np.array(column.values)  # Creating a np array to use scikitlearn
            reshaped = reshaped.reshape(-1,1)  # changing shape of data to a 2D array
            normalized = imputer.fit_transform(reshaped) # transforming data
            data[label] = normalized # changing the column value to the new one

With KNNImputer, I got something like this (The way that I don't want):

[[72829],
 [68088.71106114],
 [73196],
 [73087],
 [68088.71106114],
 [68088.71106114],
 [72294.5]]

Someone knows any ideia or algorithm that could give a "uniformity" to the array numbers like this ? The ideia is that the return of this method gives me the possibility to plot graphs without missing data. If were something with pandas/numpy/scikit-learn would be better, thanks.

CodePudding user response:

Convert data to a dataframe and use b(efore)fill and f(orward)fill

x = [[72829],
 [np.nan],
 [73196],
 [73087],
 [np.nan],
 [np.nan],
 [72294.5]]
df = pd.DataFrame(x)
df = (df[0].bfill()   df[0].ffill())/2
df
>>>
0    72829.00
1    73012.50
2    73196.00
3    73087.00
4    72690.75
5    72690.75
6    72294.50

CodePudding user response:

In[0]:
import pandas as pd

series = pd.Series([72829,
 None,
 73196,
 73087,
 None,
 None,
 72294.5])

series.interpolate(method='linear')

Out[0]:
0    72829.000000
1    73012.500000
2    73196.000000
3    73087.000000
4    72822.833333
5    72558.666667
6    72294.500000
dtype: float64
  • Related