Home > OS >  Find the minimum and maximum values in the columns of a pandas data frame
Find the minimum and maximum values in the columns of a pandas data frame

Time:05-31

I did look for a question similar to mine, but I did not find any answers still.

I have a data frame like this:

          achaea    bacteria    plastids    mitochondrion   viruses
CTAG    -22.141701  -27.891441  -2.474725   0.262533    0.026349
GGCC    -13.403537  -21.490028  -0.403491   -0.271403   -0.243087
GATC    -20.933825  -14.761891  4.681494    -0.098965   0.088650
CATG    -8.490766   -9.910195   1.150736    -0.005730   0.508743
TAAG    -17.376165  -18.653078  -1.525354   -0.708633   -1.917676

And my doubt is: 'How can I get the minimum and maximum values from the columns?" I would like some like this:

Minimum values:

archaea     CTAG  -22.141701 
bacteria    CTAG  -27.891441   
plastids    CTAG  -2.474725
mitochondrion   TAAG -0.708633
viruses    TAAG    -1.917676

Maximum values:

archaea     CATG    -8.4907661 
bacteria    CATG    -9.910195   
plastids    GATC  4.681494
mitochondrion   CTAG 0.262533
viruses    CATG    0.508743

I have tried:

df.min()
achaea          -22.141701
bacteria        -27.891441
plastids         -4.654833
mitochondrion    -0.881587
viruses          -1.917676
dtype: float64

df['achaea'].idxmin()
'CTAG'

df.reset_index().min()
index                AAAA
achaea           -22.1417
bacteria         -27.8914
plastids         -4.65483
mitochondrion   -0.881587
viruses          -1.91768
dtype: object

Well, I tried this and it is very close:

for col, idx in zip(df.columns, df.index):
    print(df[col].min(), idx, col)

-22.141701229820306 CTAG archaea
-27.89144069672985 GGCC bacteria
-4.654832775512324 GATC plastids
-0.8815871622500514 CATG mitochondrion
-1.917675731085761 TAAG viruses

Any suggestion to improve the efficiency and the code would be appreciate! Thank you for your time and effort. Paulo

CodePudding user response:

You can use:

df.where(df.eq(df.min())).T.stack()
  • identify the column values equal to the min per column
  • mask the other values
  • stack to reshape while dropping the NaNs (transpose first to change the order or the index)

output:

achaea         CTAG   -22.141701
bacteria       CTAG   -27.891441
plastids       CTAG    -2.474725
mitochondrion  TAAG    -0.708633
viruses        TAAG    -1.917676
dtype: float64

CodePudding user response:

An interesting option is agg with a list of functions:

result = df.agg([min, max])

For your data sample I got:

        achaea   bacteria  plastids  mitochondrion   viruses
min -22.141701 -27.891441 -2.474725      -0.708633 -1.917676
max  -8.490766  -9.910195  4.681494       0.262533  0.508743

But if you want min/max values with their indices, then:

  1. Define the following function:

    def xx(col):
        iMin = col.idxmin()
        iMax = col.idxmax()
        return pd.Series([col[iMin], col[iMax]],
            index=[[col.name, col.name], [iMin, iMax]])
    
  2. Concatenate results for each column:

    result = pd.concat([ xx(df[col]) for col in df ])
    

The result is:

achaea         CTAG   -22.141701
               CATG    -8.490766
bacteria       CTAG   -27.891441
               CATG    -9.910195
plastids       CTAG    -2.474725
               GATC     4.681494
mitochondrion  TAAG    -0.708633
               CTAG     0.262533
viruses        TAAG    -1.917676
               CATG     0.508743
dtype: float64

The first index level shows the column name.

And the second level shows index values of min / max value in this column.

  • Related