I have the following data frame:
d = {'T': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'Val1': [10, np.NaN, 14, np.NaN, np.NaN, np.NaN, 20, np.NaN, np.NaN, 30]}
df = pd.DataFrame(data=d)
T Val1
1 10.0
2 NaN
3 14.0
4 NaN
5 NaN
6 NaN
7 20.0
8 NaN
9 NaN
10 30.0
I want to fill the NaN with different values depending on certain conditions:
- If the value
V
is NaN and ifV 1
andV-1
are not NaN thenV=np.mean([V 1, V-1])
- If the values
V
andV 1
are NaN and ifV-1
andV 2
are not NaN then I want to fill them following this formula:V=np.cbrt([(V-1)*(V-1)*(V 2)])
ANDV 1=np.cbrt([(V-1)*(V 2)*(V 2)])
- Other NaN should be removed
So the wanted datatable should look like:
T Val1
1 10.0
2 12.0
3 14.0
7 20.0
8 22.89
9 26.20
10 30.0
I was able to do the V=np.mean([V 1, V-1])
by the following command:
df1 = pd.concat([df.ffill(), df.bfill()]).groupby(level=0).mean()
T Val1
1 10.0
2 12.0
3 14.0
4 17.0
5 17.0
6 17.0
7 20.0
8 25.0
9 25.0
10 30.0
But I don't know how to incorporate the different conditions.
I tried using np.select()
but I can't find a way to recover following and previous value and add them to the conditions.
Thanks a lot
CodePudding user response:
You can use:
def condition_2(a, b): #a = V-1, b = V 2
return np.cbrt((a) * (a) * (b))
def condition_3(a,b): # a = V-2, b=V 1
return np.cbrt((a) * (b) * (b))
d = {'T': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'Val1': [10, np.NaN, 14, np.NaN, np.NaN, np.NaN, 20, np.NaN, np.NaN, 30]}
df = pd.DataFrame(data=d)
cond_1 = df['Val1'].isnull() & df['Val1'].shift(1).notna() & df['Val1'].shift(-1).notna()
cond_2 = df['Val1'].isnull() & df['Val1'].shift(1).notna() & df['Val1'].shift(-1).isnull() & df['Val1'].shift(-2).notna()
cond_3 = df['Val1'].isnull() & df['Val1'].shift(-1).notna() & df['Val1'].shift(1).isnull() & df['Val1'].shift(2).notna()
df['Val1'] = np.where(cond_1, (df['Val1'].shift(1) df['Val1'].shift(-1))/2, df['Val1'])
df['Val1'] = np.where(cond_2, condition_2(df['Val1'].shift(1), df['Val1'].shift(-2)), df['Val1'])
df['Val1'] = np.where(cond_3, condition_3(df['Val1'].shift(2), df['Val1'].shift(-1)), df['Val1'])
df.dropna(subset=['Val1'], inplace=True)
OUTPUT
T Val1
0 1 10.000000
1 2 12.000000
2 3 14.000000
6 7 20.000000
7 8 22.894285
8 9 26.207414
9 10 30.000000
CodePudding user response:
Here's one solution using np.split
and a custom function. Basically split on non-NaN values and iterate over each split to evaluate whether to drop NaN or change NaN:
def nan2notna(arr1, arr2):
mask = pd.isna(arr1)
if len(arr1[mask]) > 2:
return arr1[~mask]
else:
if len(arr1[mask]) == 2:
arr1[mask] = [np.cbrt([(arr1.iloc[0])*(arr1.iloc[0])*(arr2.iloc[0])]), np.cbrt([(arr1.iloc[0])*(arr2.iloc[0])*(arr2.iloc[0])])]
elif len(arr1[mask]) == 1:
arr1[mask] = np.mean([arr1.iloc[0], arr2.iloc[0]])
else:
pass
return arr1
splits = np.split(df['Val1'], np.where(pd.notna(df['Val1']))[0])[1:]
out = (df.merge(pd.concat([nan2notna(arr1, arr2) for (arr1, arr2) in zip(splits, splits[1:] [None])]).to_frame(),
left_index=True, right_index=True)
.drop(columns='Val1_x')
.rename(columns={'Val1_y':'Val1'})
.round(2))
Output:
T Val1
0 1 10.00
1 2 12.00
2 3 14.00
6 7 20.00
7 8 22.89
8 9 26.21
9 10 30.00