Home > front end >  FillNaN with multiple conditions and using n-1 and n 2 values with Pandas
FillNaN with multiple conditions and using n-1 and n 2 values with Pandas

Time:12-22

I have the following data frame:

d = {'T': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'Val1': [10, np.NaN, 14, np.NaN, np.NaN, np.NaN, 20, np.NaN, np.NaN, 30]}
df = pd.DataFrame(data=d)

T   Val1
1   10.0
2   NaN
3   14.0
4   NaN
5   NaN
6   NaN
7   20.0
8   NaN
9   NaN
10  30.0

I want to fill the NaN with different values depending on certain conditions:

  1. If the value V is NaN and if V 1 and V-1 are not NaN then V=np.mean([V 1, V-1])
  2. If the values V and V 1 are NaN and if V-1 and V 2 are not NaN then I want to fill them following this formula: V=np.cbrt([(V-1)*(V-1)*(V 2)]) AND V 1=np.cbrt([(V-1)*(V 2)*(V 2)])
  3. Other NaN should be removed

So the wanted datatable should look like:

T   Val1
1   10.0
2   12.0
3   14.0
7   20.0
8   22.89
9   26.20
10  30.0

I was able to do the V=np.mean([V 1, V-1]) by the following command:

df1 = pd.concat([df.ffill(), df.bfill()]).groupby(level=0).mean()

T   Val1
1   10.0
2   12.0
3   14.0
4   17.0
5   17.0
6   17.0
7   20.0
8   25.0
9   25.0
10  30.0

But I don't know how to incorporate the different conditions. I tried using np.select() but I can't find a way to recover following and previous value and add them to the conditions.

Thanks a lot

CodePudding user response:

You can use:

def condition_2(a, b): #a = V-1, b = V 2
    return np.cbrt((a) * (a) * (b))

def condition_3(a,b): # a = V-2, b=V 1
    return np.cbrt((a) * (b) * (b))


d = {'T': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'Val1': [10, np.NaN, 14, np.NaN, np.NaN, np.NaN, 20, np.NaN, np.NaN, 30]}
df = pd.DataFrame(data=d)
cond_1 = df['Val1'].isnull() & df['Val1'].shift(1).notna() & df['Val1'].shift(-1).notna()
cond_2 = df['Val1'].isnull() & df['Val1'].shift(1).notna() & df['Val1'].shift(-1).isnull() & df['Val1'].shift(-2).notna()
cond_3 = df['Val1'].isnull() & df['Val1'].shift(-1).notna() & df['Val1'].shift(1).isnull() & df['Val1'].shift(2).notna()

df['Val1'] = np.where(cond_1, (df['Val1'].shift(1)   df['Val1'].shift(-1))/2, df['Val1'])
df['Val1'] = np.where(cond_2, condition_2(df['Val1'].shift(1), df['Val1'].shift(-2)), df['Val1'])
df['Val1'] = np.where(cond_3, condition_3(df['Val1'].shift(2), df['Val1'].shift(-1)), df['Val1'])

df.dropna(subset=['Val1'], inplace=True)

OUTPUT

    T       Val1
0   1  10.000000
1   2  12.000000
2   3  14.000000
6   7  20.000000
7   8  22.894285
8   9  26.207414
9  10  30.000000

CodePudding user response:

Here's one solution using np.split and a custom function. Basically split on non-NaN values and iterate over each split to evaluate whether to drop NaN or change NaN:

def nan2notna(arr1, arr2):
    mask = pd.isna(arr1)
    if len(arr1[mask]) > 2:
        return arr1[~mask] 
    else:
        if len(arr1[mask]) == 2:
            arr1[mask] = [np.cbrt([(arr1.iloc[0])*(arr1.iloc[0])*(arr2.iloc[0])]), np.cbrt([(arr1.iloc[0])*(arr2.iloc[0])*(arr2.iloc[0])])]
        elif len(arr1[mask]) == 1:
            arr1[mask] = np.mean([arr1.iloc[0], arr2.iloc[0]])
        else:
            pass
        return arr1

splits = np.split(df['Val1'], np.where(pd.notna(df['Val1']))[0])[1:]
out = (df.merge(pd.concat([nan2notna(arr1, arr2) for (arr1, arr2) in zip(splits, splits[1:] [None])]).to_frame(), 
               left_index=True, right_index=True)
       .drop(columns='Val1_x')
       .rename(columns={'Val1_y':'Val1'})
       .round(2))    

Output:

    T   Val1
0   1  10.00
1   2  12.00
2   3  14.00
6   7  20.00
7   8  22.89
8   9  26.21
9  10  30.00
  • Related