Home > Net >  Pandas: take the minimum of two operations on two dataframes, while preserving index
Pandas: take the minimum of two operations on two dataframes, while preserving index

Time:09-13

I'm a beginner with Pandas. I've got two dataframes df1 and df2 of three columns each, labelled by some index.

I would like to get a third dataframe whose entries are min( df1-df2, 1-df1-df2 ) for each column, while preserving the index.

I don't know how to do this on all the three columns at once. If I try e.g. np.min( df1-df2, 1-df1-df2 ) I get TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed, whereas min( df1-df2, 1-df1 df2 ) gives ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I can't use apply because I've got more than one dataframe. Basically, I would like to use something like subtract, but with the ability to define my own function.

Example: consider these two dataframes df0 = pd.DataFrame( [[0.1,0.2,0.3], [0.3, 0.1, 0.2], [0.1, 0.3, 0.9]], index=[2,1,3], columns=['px', 'py', 'pz'] )

In [4]: df0
Out[4]: 
    px   py   pz
2  0.1  0.2  0.3
1  0.3  0.1  0.2
3  0.1  0.3  0.9

and

df1 = pd.DataFrame( [[0.9,0.1,0.9], [0.1,0.2,0.1], [0.3,0.1,0.8]], index=[3,1,2], columns=['px', 'py', 'pz'])

    px   py   pz
3  0.9  0.1  0.9
1  0.1  0.2  0.1
2  0.3  0.1  0.8

my desired output is a new dataframe df, made up of three columns 'px', 'py', 'pz', whose entries are:

   for j in range(1,4):
       dfx[j-1] = min( df0['px'][j] - df1['px'][j], 1 - df0['px'][j]   df1['px'][j] )

for df['px'], and similarly for 'py' and 'pz'.

    px   py   pz
1  0.2 -0.1  0.1
2 -0.2  0.1 -0.5
3 -0.8  0.2  0.0

I hope it's clear now! Thanks in advance!

CodePudding user response:

pandas is smart enough to match up the columns and index values for you in a vectorized way. If you're looping a dataframe, you're probably doing it wrong.

m1 = df0 - df1
m2 = 1 - (df0   df1)

# Take the values from m1 where they're less than
# The corresponding value in m2. Otherwise, take m2:
out = m1[m1.lt(m2)].combine_first(m2)

# Another method: Combine our two calculated frames, 
# groupby the index, and take the minimum.
out = pd.concat([m1, m2]).groupby(level=0).min()
print(out)

# Output:

    px   py   pz
1  0.2 -0.1  0.1
2 -0.2  0.1 -0.5
3 -0.8  0.2 -0.8
  • Related