Home > database >  Keeping the minimum row value across multiple columns and the value of closest neighboring column
Keeping the minimum row value across multiple columns and the value of closest neighboring column

Time:09-14

Given the example table

df = pd.DataFrame({'A':[8,4,8,4,9],'Ap':[0.001,0.06,0.001,0.1,0.002],'B':[7,3,9,3,6],
                  'Bp':[0.005,0.006,0.01,0.007,0.06],'C':[4,1,4,8,9],
                  'Cp':[0.004,0.008,0.2,0.006,0.00001]}, index=['x','y','z','zz','yz'])

That looks like this:

    A     Ap    B   Bp      C   Cp
x   8   0.001   7   0.005   4   0.00400
y   4   0.060   3   0.006   1   0.00800
z   8   0.001   9   0.010   4   0.20000
zz  4   0.100   3   0.007   8   0.00600
yz  9   0.002   6   0.060   9   0.00001

I'd like the keep/record the row value for the column with the lowest value from (A,B,C)

new = pd.DataFrame()
new['Minimum'] = df[[df.columns[0],df.columns[2],df.columns[4]]].min(axis=1)

This result will look like this

    Minimum
x   4
y   1
z   4
zz  3
yz  6

But I'd also like to record the pval associated with the minimum value kept (Ap, Bp, Cp) and I'm unsure how to accomplish that.

So for example the final result should look like this

    Minimum pVal
x   4       0.004
y   1       0.008
z   4       0.200
zz  3       0.007
yz  6       0.060

CodePudding user response:

Lets use idxmin to get the column names corresponding to min values then use advance indexing with numpy to get the corresponding min values

c = ['A', 'B', 'C']
x, y = range(len(df)), df[c].idxmin(1)

df['min'] = df.values[x, df.columns.get_indexer_for(y)]
df['pVal'] = df.values[x, df.columns.get_indexer_for(y   'p')]

Result

    A     Ap  B     Bp  C       Cp  min   pVal
x   8  0.001  7  0.005  4  0.00400  4.0  0.004
y   4  0.060  3  0.006  1  0.00800  1.0  0.008
z   8  0.001  9  0.010  4  0.20000  4.0  0.200
zz  4  0.100  3  0.007  8  0.00600  3.0  0.007
yz  9  0.002  6  0.060  9  0.00001  6.0  0.060
Some details
  • idxmin(1): returns the name of column corresponding to min value for each row
  • df.columns.get_indexer_for returns the numerical indices(zero based) which can then be used to access the corresponding columns
  • Related