I have this df with 9 columns
x y1_x y2_x y3_x y4_x
0 -17.7 -0.785430 NaN NaN NaN
1 -15.0 NaN NaN NaN -3820.085000
2 -12.5 NaN NaN 2.138833 NaN
3 -12.4 NaN NaN 1.721205 NaN
4 -12.1 NaN 2.227343 2.227343 NaN
d1 d2 d3 d4
0 0.053884 NaN NaN NaN
1 NaN NaN NaN 0.085000
2 NaN NaN 0.143237 NaN
3 NaN NaN 0.251180 NaN
4 NaN 0.127343 0.440931 NaN
Between y1_x
and y4_x
I can only have 1 non NaN
value per row.
The condition to choose which value is removed is explained in this example:
In row 4
there are 2 values between y1_x
and y4_x
The value that becomes NaN
is the one from y3_x
because in that same row, d3 > d2
CodePudding user response:
you can use:
#get count of nans between ('y1_x', 'y2_x', 'y3_x', 'y4_x', 'd1', 'd2', 'd3', 'd4')
final['mask']=final.iloc[:,1:5].isna().sum(axis=1)
#if the mask is 2, it means it will be filled with nan.
count=len(final[final['mask']==2])
#We enter the loop as many as the number of rows with a mask value of 2.
for i in range(0,count):
mask=final[final['mask']==2].iloc[[i]] #get nth row
#get columns (('y1_x', 'y2_x', 'y3_x', 'y4_x', 'd1', 'd2', 'd3', 'd4')) and drop nans
abc=mask[mask['mask']==2].iloc[:,1:8].dropna(axis=1,how='all')
to_replace_col_name=abc.iloc[:,2:4].max(axis=0).reset_index()['index'][1] #which column is bigger ?
#Once we know the name of the big column, the 2 columns before this column is the one we will change.
replace_col=abc.columns.get_loc(to_replace_col_name)
replace_col=abc.columns[replace_col - 2]
#now we know which column to change
mask[replace_col]=np.nan
final.loc[mask.index,:]=mask #replace row according to index name
print(final)
'''
x y1_x y2_x y3_x ... d2 d3 d4 mask
0 -17.7 -0.78543 NaN NaN ... NaN NaN NaN 3
1 -15.0 NaN NaN NaN ... NaN NaN 0.085 3
2 -12.5 NaN NaN 2.138833 ... NaN 0.143237 NaN 3
3 -12.4 NaN NaN 1.721205 ... NaN 0.251180 NaN 3
4 -12.1 NaN 2.227343 NaN ... 0.127343 0.440931 NaN 2
'''
CodePudding user response:
Another possible solution:
auxd = df.filter(like='d')
auxy = df.filter(like='y')
auxd.columns = auxy.columns
colmin = auxd.idxmin(axis=1)
df[auxy.columns] = auxy.apply(lambda x: x.where(colmin.eq(x.name)))
Output:
x y1_x y2_x y3_x y4_x d1 d2 d3 d4
0 -17.7 -0.78543 NaN NaN NaN 0.053884 NaN NaN NaN
1 -15.0 NaN NaN NaN -3820.085 NaN NaN NaN 0.085
2 -12.5 NaN NaN 2.138833 NaN NaN NaN 0.143237 NaN
3 -12.4 NaN NaN 1.721205 NaN NaN NaN 0.251180 NaN
4 -12.1 NaN 2.227343 NaN NaN NaN 0.127343 0.440931 NaN