Home > database >  Extracting values of a pandas DataFrame using maximal indexes from another DataFrame
Extracting values of a pandas DataFrame using maximal indexes from another DataFrame

Time:10-29

I have two pandas dataframes:

df1 = pd.DataFrame({
 'a': [1, 2, 3],
 'b': [2, 4, 2],
 'c': [3, 1, 5],
 'd': [-1, 0, 1]
}, index=('A', 'B', 'C'))
df2 = pd.DataFrame({
 'a': [0, 5, 10],
 'b': [9, 5, 1],
 'c': [3, 4, 2],
 'd': [12, 3, 0]
}, index=('A', 'B', 'C'))

>>> print(df1)
   a  b  c  d
A  1  2  3 -1
B  2  4  1  0
C  3  2  5  1
>>> print(df2)
    a  b  c   d
A   0  9  3  12
B   5  5  4   3
C  10  1  2   0

I would like to get the value from each column of df1 that corresponds to (i.e. is at the same coordinates as) the maximal value of that same column in df2. So in the above example it should return the values [3, 2, 1, -1]. I was able to get the correct indices using idxmax:

>>> print(df2.idxmax())
a    C
b    A
c    B
d    A
dtype: object

As you can see, these are indeed the indices corresponding to the column-wise maximums in df2. However trying to index into df1 using these indices does not return the desired result:

>>> print(df1.loc[df2.idxmax()])
   a  b  c  d
C  3  2  5  1
A  1  2  3 -1
B  2  4  1  0
A  1  2  3 -1

This indexing seems to only use the values of the Series returned by idxmax. How do I correctly index df1 using both parts (labels and values) of the returned Series? Or is there maybe a simpler solution to achieve what I want?

CodePudding user response:

Try this:

s = df2.idxmax()
[df1.loc[row, col] for col, row in s.items()]

CodePudding user response:

If you want a vectorial solution (not a loop), you can use :

import numpy as np

a = df1.to_numpy()[np.argmax(df2.to_numpy(), axis=0),
                   np.arange(df1.shape[1])]
# array([ 3,  2,  1, -1])

# as Series
out = pd.Series(a, index=df1.columns)

output:

a    3
b    2
c    1
d   -1
dtype: int64

If you have only 1 max value per column, you can use this pandas alternative:

out = df1.where(df2.eq(df2.max())).T.stack()

output:

a  C    3.0
b  A    2.0
c  B    1.0
d  A   -1.0
dtype: float64

NB. if you want you can droplevel(-1) for the same output as the first approach.

  • Related