Home > Back-end >  Sort dataframe based on minimum value of two columns
Sort dataframe based on minimum value of two columns

Time:01-25

Let's assume I have the following dataframe:

import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4, 2, 1, 3], 'col3': [1,0,1,1], 'outcome': [1,0,1,0]}
df = pd.DataFrame(data=d)

I want this dataframe sorted by col1 and col2 on the minimum value. The order of the indexes should be 2, 0, 1, 3.

I tried this with df.sort_values(by=['col2', 'col1']), but than it takes the minimum of col1 first and then of col2. Is there anyway to order by taking the minimum of two columns?

CodePudding user response:

Using numpy.lexsort:

order = np.lexsort(np.sort(df[['col1', 'col2']])[:, ::-1].T)

out = df.iloc[order]

Output:

   col1  col2  col3  outcome
2     3     1     1        1
0     1     4     1        1
1     2     2     0        0
3     4     3     1        0

Note that you can easily handle any number of columns:

df.iloc[np.lexsort(np.sort(df[['col1', 'col2', 'col3']])[:, ::-1].T)]

   col1  col2  col3  outcome
1     2     2     0        0
2     3     1     1        1
0     1     4     1        1
3     4     3     1        0

CodePudding user response:

One way (not the most efficient):

idx = df[['col2', 'col1']].apply(lambda x: tuple(sorted(x)), axis=1).sort_values().index

Output:

>>> df.loc[idx]
   col1  col2  col3  outcome
2     3     1     1        1
0     1     4     1        1
1     2     2     0        0
3     4     3     1        0

>>> idx
Int64Index([2, 0, 1, 3], dtype='int64')

CodePudding user response:

you can decorate-sort-undecorate where decoration is minimal and other (i.e., maximal) values per row:

cols = ["col1", "col2"]
(df.assign(_min=df[cols].min(axis=1), _other=df[cols].max(axis=1))
   .sort_values(["_min", "_other"])
   .drop(columns=["_min", "_other"]))

to get

   col1  col2  col3  outcome
2     3     1     1        1
0     1     4     1        1
1     2     2     0        0
3     4     3     1        0

CodePudding user response:

I would compute min(col1, col2) as new column and then sort by it

import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4, 2, 1, 3], 'col3': [1,0,1,1], 'outcome': [1,0,1,0]}
df = pd.DataFrame(data=d)
df['colmin'] = df[['col1','col2']].min(axis=1) # compute min
df = df.sort_values(by='colmin').drop(columns='colmin') # sort then drop min
print(df)

gives output

   col1  col2  col3  outcome
0     1     4     1        1
2     3     1     1        1
1     2     2     0        0
3     4     3     1        0
  • Related