Home > Mobile >  How to fill missing values using start and end values?
How to fill missing values using start and end values?

Time:11-25

I have start and end value and there are n missing values. The logic to fill missing value is find average between start and end value.

psuedo-code is:

start_index = 0
end_index = len(l) 
while
    l[mid] = (l[start_index] l[end_index])/2
    update start_index and end_index
    repeat

For ex, start = 10, end = 20 and missing values count is 4.

step - 1: l = [10, nan, nan, nan, nan, 20]

step - 2: l = [10. nan, 15, nan, nan, 20]  => (10 20)/2 = 15

step - 3: l = [10. 12.5, 15, nan, nan, 20] => (10 15)/2 = 12.5

step - 4: l = [10. 12.5, 15, 17.5, nan, 20] => (15 20)/2 = 17.5

step - 5: l = [10. 12.5, 15, 17.5, 18.75, 20] => (17.5 20)/2 = 18.75

How to do this in python or pandas.

I am unable to get optimised solution for this. Any help would be appreciable.

CodePudding user response:

You can simply use pandas.DataFrame.interpolate

What you want to do is simple linear interpolation between start and end values to fill the NA values. That is the definition of the pd.DataFrame.interpolate function. You can limit_direction='both'This will help with consecutive nans

CodePudding user response:

Not sure if this is what you meant but this solution works:

l = [10, np.nan, np.nan,np.nan,np.nan, 20]
    
def get_end_index():
    for index, val in enumerate(l[start_index 1:]):
        if val is not np.nan:
            break
    return start_index   index   1

while np.nan in l:
    start_index = l.index(np.nan) - 1
    end_index = get_end_index()
    mid_index = int((start_index   end_index)/2) 
    
    l[mid_index] =  (l[start_index] l[end_index])/2
l
  • Related