Remove a value of a data frame based on a condition between columns-CodePudding

I have this df with 9 columns

       x      y1_x      y2_x        y3_x         y4_x      
0  -17.7 -0.785430       NaN         NaN          NaN  
1  -15.0       NaN       NaN         NaN -3820.085000    
2  -12.5       NaN       NaN    2.138833          NaN   
3  -12.4       NaN       NaN    1.721205          NaN    
4  -12.1       NaN  2.227343    2.227343          NaN  

           d1        d2          d3          d4 
0    0.053884       NaN         NaN         NaN  
1         NaN       NaN         NaN    0.085000  
2         NaN       NaN    0.143237         NaN  
3         NaN       NaN    0.251180         NaN  
4         NaN  0.127343    0.440931         NaN

Between y1_x and y4_x I can only have 1 non NaN value per row.

The condition to choose which value is removed is explained in this example:

In row 4 there are 2 values between y1_x and y4_x

The value that becomes NaN is the one from y3_x because in that same row, d3 > d2

CodePudding user response：

you can use:

#get count of nans between ('y1_x', 'y2_x', 'y3_x', 'y4_x', 'd1', 'd2', 'd3', 'd4')
final['mask']=final.iloc[:,1:5].isna().sum(axis=1)

#if the mask is 2, it means it will be filled with nan.
count=len(final[final['mask']==2])

#We enter the loop as many as the number of rows with a mask value of 2.
for i in range(0,count):
    mask=final[final['mask']==2].iloc[[i]] #get nth row

    #get columns (('y1_x', 'y2_x', 'y3_x', 'y4_x', 'd1', 'd2', 'd3', 'd4')) and drop nans
    abc=mask[mask['mask']==2].iloc[:,1:8].dropna(axis=1,how='all')
    to_replace_col_name=abc.iloc[:,2:4].max(axis=0).reset_index()['index'][1] #which column is bigger ?

    #Once we know the name of the big column, the 2 columns before this column is the one we will change.

    replace_col=abc.columns.get_loc(to_replace_col_name) 
    replace_col=abc.columns[replace_col - 2]

    #now we know which column to change
    mask[replace_col]=np.nan
    final.loc[mask.index,:]=mask #replace row according to index name

print(final)
'''
      x     y1_x      y2_x      y3_x  ...        d2        d3     d4  mask
0 -17.7 -0.78543       NaN       NaN  ...       NaN       NaN    NaN     3
1 -15.0      NaN       NaN       NaN  ...       NaN       NaN  0.085     3
2 -12.5      NaN       NaN  2.138833  ...       NaN  0.143237    NaN     3
3 -12.4      NaN       NaN  1.721205  ...       NaN  0.251180    NaN     3
4 -12.1      NaN  2.227343       NaN  ...  0.127343  0.440931    NaN     2

'''

CodePudding user response：

Another possible solution:

auxd = df.filter(like='d')
auxy = df.filter(like='y')
auxd.columns = auxy.columns
colmin = auxd.idxmin(axis=1)

df[auxy.columns] = auxy.apply(lambda x: x.where(colmin.eq(x.name)))

Output:

      x     y1_x      y2_x      y3_x      y4_x        d1        d2        d3     d4
0 -17.7 -0.78543       NaN       NaN       NaN  0.053884       NaN       NaN    NaN
1 -15.0      NaN       NaN       NaN -3820.085       NaN       NaN       NaN  0.085
2 -12.5      NaN       NaN  2.138833       NaN       NaN       NaN  0.143237    NaN
3 -12.4      NaN       NaN  1.721205       NaN       NaN       NaN  0.251180    NaN
4 -12.1      NaN  2.227343       NaN       NaN       NaN  0.127343  0.440931    NaN