I did look for a question similar to mine, but I did not find any answers still.
I have a data frame like this:
achaea bacteria plastids mitochondrion viruses
CTAG -22.141701 -27.891441 -2.474725 0.262533 0.026349
GGCC -13.403537 -21.490028 -0.403491 -0.271403 -0.243087
GATC -20.933825 -14.761891 4.681494 -0.098965 0.088650
CATG -8.490766 -9.910195 1.150736 -0.005730 0.508743
TAAG -17.376165 -18.653078 -1.525354 -0.708633 -1.917676
And my doubt is: 'How can I get the minimum and maximum values from the columns?" I would like some like this:
Minimum values:
archaea CTAG -22.141701
bacteria CTAG -27.891441
plastids CTAG -2.474725
mitochondrion TAAG -0.708633
viruses TAAG -1.917676
Maximum values:
archaea CATG -8.4907661
bacteria CATG -9.910195
plastids GATC 4.681494
mitochondrion CTAG 0.262533
viruses CATG 0.508743
I have tried:
df.min()
achaea -22.141701
bacteria -27.891441
plastids -4.654833
mitochondrion -0.881587
viruses -1.917676
dtype: float64
df['achaea'].idxmin()
'CTAG'
df.reset_index().min()
index AAAA
achaea -22.1417
bacteria -27.8914
plastids -4.65483
mitochondrion -0.881587
viruses -1.91768
dtype: object
Well, I tried this and it is very close:
for col, idx in zip(df.columns, df.index):
print(df[col].min(), idx, col)
-22.141701229820306 CTAG archaea
-27.89144069672985 GGCC bacteria
-4.654832775512324 GATC plastids
-0.8815871622500514 CATG mitochondrion
-1.917675731085761 TAAG viruses
Any suggestion to improve the efficiency and the code would be appreciate! Thank you for your time and effort. Paulo
CodePudding user response:
You can use:
df.where(df.eq(df.min())).T.stack()
- identify the column values equal to the min per column
- mask the other values
- stack to reshape while dropping the NaNs (transpose first to change the order or the index)
output:
achaea CTAG -22.141701
bacteria CTAG -27.891441
plastids CTAG -2.474725
mitochondrion TAAG -0.708633
viruses TAAG -1.917676
dtype: float64
CodePudding user response:
An interesting option is agg with a list of functions:
result = df.agg([min, max])
For your data sample I got:
achaea bacteria plastids mitochondrion viruses
min -22.141701 -27.891441 -2.474725 -0.708633 -1.917676
max -8.490766 -9.910195 4.681494 0.262533 0.508743
But if you want min/max values with their indices, then:
Define the following function:
def xx(col): iMin = col.idxmin() iMax = col.idxmax() return pd.Series([col[iMin], col[iMax]], index=[[col.name, col.name], [iMin, iMax]])
Concatenate results for each column:
result = pd.concat([ xx(df[col]) for col in df ])
The result is:
achaea CTAG -22.141701
CATG -8.490766
bacteria CTAG -27.891441
CATG -9.910195
plastids CTAG -2.474725
GATC 4.681494
mitochondrion TAAG -0.708633
CTAG 0.262533
viruses TAAG -1.917676
CATG 0.508743
dtype: float64
The first index level shows the column name.
And the second level shows index values of min / max value in this column.