I have two pandas dataframes:
df1 = pd.DataFrame({
'a': [1, 2, 3],
'b': [2, 4, 2],
'c': [3, 1, 5],
'd': [-1, 0, 1]
}, index=('A', 'B', 'C'))
df2 = pd.DataFrame({
'a': [0, 5, 10],
'b': [9, 5, 1],
'c': [3, 4, 2],
'd': [12, 3, 0]
}, index=('A', 'B', 'C'))
>>> print(df1)
a b c d
A 1 2 3 -1
B 2 4 1 0
C 3 2 5 1
>>> print(df2)
a b c d
A 0 9 3 12
B 5 5 4 3
C 10 1 2 0
I would like to get the value from each column of df1
that corresponds to (i.e. is at the same coordinates as) the maximal value of that same column in df2
. So in the above example it should return the values [3, 2, 1, -1]
. I was able to get the correct indices using idxmax
:
>>> print(df2.idxmax())
a C
b A
c B
d A
dtype: object
As you can see, these are indeed the indices corresponding to the column-wise maximums in df2
. However trying to index into df1
using these indices does not return the desired result:
>>> print(df1.loc[df2.idxmax()])
a b c d
C 3 2 5 1
A 1 2 3 -1
B 2 4 1 0
A 1 2 3 -1
This indexing seems to only use the values of the Series
returned by idxmax
. How do I correctly index df1
using both parts (labels and values) of the returned Series
? Or is there maybe a simpler solution to achieve what I want?
CodePudding user response:
Try this:
s = df2.idxmax()
[df1.loc[row, col] for col, row in s.items()]
CodePudding user response:
If you want a vectorial solution (not a loop), you can use numpy:
import numpy as np
a = df1.to_numpy()[np.argmax(df2.to_numpy(), axis=0),
np.arange(df1.shape[1])]
# array([ 3, 2, 1, -1])
# as Series
out = pd.Series(a, index=df1.columns)
output:
a 3
b 2
c 1
d -1
dtype: int64
If you have only 1 max value per column, you can use this pandas alternative:
out = df1.where(df2.eq(df2.max())).T.stack()
output:
a C 3.0
b A 2.0
c B 1.0
d A -1.0
dtype: float64
NB. if you want you can droplevel(-1)
for the same output as the first approach.