Find the minimum and maximum values in the columns of a pandas data frame-CodePudding

I did look for a question similar to mine, but I did not find any answers still.

I have a data frame like this:

          achaea    bacteria    plastids    mitochondrion   viruses
CTAG    -22.141701  -27.891441  -2.474725   0.262533    0.026349
GGCC    -13.403537  -21.490028  -0.403491   -0.271403   -0.243087
GATC    -20.933825  -14.761891  4.681494    -0.098965   0.088650
CATG    -8.490766   -9.910195   1.150736    -0.005730   0.508743
TAAG    -17.376165  -18.653078  -1.525354   -0.708633   -1.917676

And my doubt is: 'How can I get the minimum and maximum values from the columns?" I would like some like this:

Minimum values:

archaea     CTAG  -22.141701 
bacteria    CTAG  -27.891441   
plastids    CTAG  -2.474725
mitochondrion   TAAG -0.708633
viruses    TAAG    -1.917676

Maximum values:

archaea     CATG    -8.4907661 
bacteria    CATG    -9.910195   
plastids    GATC  4.681494
mitochondrion   CTAG 0.262533
viruses    CATG    0.508743

I have tried:

df.min()
achaea          -22.141701
bacteria        -27.891441
plastids         -4.654833
mitochondrion    -0.881587
viruses          -1.917676
dtype: float64

df['achaea'].idxmin()
'CTAG'

df.reset_index().min()
index                AAAA
achaea           -22.1417
bacteria         -27.8914
plastids         -4.65483
mitochondrion   -0.881587
viruses          -1.91768
dtype: object

Well, I tried this and it is very close:

for col, idx in zip(df.columns, df.index):
    print(df[col].min(), idx, col)

-22.141701229820306 CTAG archaea
-27.89144069672985 GGCC bacteria
-4.654832775512324 GATC plastids
-0.8815871622500514 CATG mitochondrion
-1.917675731085761 TAAG viruses

Any suggestion to improve the efficiency and the code would be appreciate! Thank you for your time and effort. Paulo

CodePudding user response：

You can use:

df.where(df.eq(df.min())).T.stack()

identify the column values equal to the min per column
mask the other values
stack to reshape while dropping the NaNs (transpose first to change the order or the index)

output:

achaea         CTAG   -22.141701
bacteria       CTAG   -27.891441
plastids       CTAG    -2.474725
mitochondrion  TAAG    -0.708633
viruses        TAAG    -1.917676
dtype: float64

CodePudding user response：

An interesting option is agg with a list of functions:

result = df.agg([min, max])

For your data sample I got:

        achaea   bacteria  plastids  mitochondrion   viruses
min -22.141701 -27.891441 -2.474725      -0.708633 -1.917676
max  -8.490766  -9.910195  4.681494       0.262533  0.508743

But if you want min/max values with their indices, then:

Define the following function:

def xx(col):
    iMin = col.idxmin()
    iMax = col.idxmax()
    return pd.Series([col[iMin], col[iMax]],
        index=[[col.name, col.name], [iMin, iMax]])

Concatenate results for each column:

result = pd.concat([ xx(df[col]) for col in df ])

The result is:

achaea         CTAG   -22.141701
               CATG    -8.490766
bacteria       CTAG   -27.891441
               CATG    -9.910195
plastids       CTAG    -2.474725
               GATC     4.681494
mitochondrion  TAAG    -0.708633
               CTAG     0.262533
viruses        TAAG    -1.917676
               CATG     0.508743
dtype: float64

The first index level shows the column name.

And the second level shows index values of min / max value in this column.