I have start and end value and there are n missing values. The logic to fill missing value is find average between start and end value.
psuedo-code is:
start_index = 0
end_index = len(l)
while
l[mid] = (l[start_index] l[end_index])/2
update start_index and end_index
repeat
For ex, start = 10, end = 20 and missing values count is 4.
step - 1: l = [10, nan, nan, nan, nan, 20]
step - 2: l = [10. nan, 15, nan, nan, 20] => (10 20)/2 = 15
step - 3: l = [10. 12.5, 15, nan, nan, 20] => (10 15)/2 = 12.5
step - 4: l = [10. 12.5, 15, 17.5, nan, 20] => (15 20)/2 = 17.5
step - 5: l = [10. 12.5, 15, 17.5, 18.75, 20] => (17.5 20)/2 = 18.75
How to do this in python or pandas.
I am unable to get optimised solution for this. Any help would be appreciable.
CodePudding user response:
You can simply use pandas.DataFrame.interpolate
What you want to do is simple linear interpolation between start and end values to fill the NA values. That is the definition of the pd.DataFrame.interpolate
function. You can limit_direction='both'
This will help with consecutive nans
CodePudding user response:
Not sure if this is what you meant but this solution works:
l = [10, np.nan, np.nan,np.nan,np.nan, 20]
def get_end_index():
for index, val in enumerate(l[start_index 1:]):
if val is not np.nan:
break
return start_index index 1
while np.nan in l:
start_index = l.index(np.nan) - 1
end_index = get_end_index()
mid_index = int((start_index end_index)/2)
l[mid_index] = (l[start_index] l[end_index])/2
l