Let's assume I have the following dataframe:
import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4, 2, 1, 3], 'col3': [1,0,1,1], 'outcome': [1,0,1,0]}
df = pd.DataFrame(data=d)
I want this dataframe sorted by col1 and col2 on the minimum value. The order of the indexes should be 2, 0, 1, 3.
I tried this with df.sort_values(by=['col2', 'col1'])
, but than it takes the minimum of col1 first and then of col2. Is there anyway to order by taking the minimum of two columns?
CodePudding user response:
Using numpy.lexsort
:
order = np.lexsort(np.sort(df[['col1', 'col2']])[:, ::-1].T)
out = df.iloc[order]
Output:
col1 col2 col3 outcome
2 3 1 1 1
0 1 4 1 1
1 2 2 0 0
3 4 3 1 0
Note that you can easily handle any number of columns:
df.iloc[np.lexsort(np.sort(df[['col1', 'col2', 'col3']])[:, ::-1].T)]
col1 col2 col3 outcome
1 2 2 0 0
2 3 1 1 1
0 1 4 1 1
3 4 3 1 0
CodePudding user response:
One way (not the most efficient):
idx = df[['col2', 'col1']].apply(lambda x: tuple(sorted(x)), axis=1).sort_values().index
Output:
>>> df.loc[idx]
col1 col2 col3 outcome
2 3 1 1 1
0 1 4 1 1
1 2 2 0 0
3 4 3 1 0
>>> idx
Int64Index([2, 0, 1, 3], dtype='int64')
CodePudding user response:
you can decorate-sort-undecorate where decoration is minimal and other (i.e., maximal) values per row:
cols = ["col1", "col2"]
(df.assign(_min=df[cols].min(axis=1), _other=df[cols].max(axis=1))
.sort_values(["_min", "_other"])
.drop(columns=["_min", "_other"]))
to get
col1 col2 col3 outcome
2 3 1 1 1
0 1 4 1 1
1 2 2 0 0
3 4 3 1 0
CodePudding user response:
I would compute min(col1, col2) as new column and then sort by it
import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4, 2, 1, 3], 'col3': [1,0,1,1], 'outcome': [1,0,1,0]}
df = pd.DataFrame(data=d)
df['colmin'] = df[['col1','col2']].min(axis=1) # compute min
df = df.sort_values(by='colmin').drop(columns='colmin') # sort then drop min
print(df)
gives output
col1 col2 col3 outcome
0 1 4 1 1
2 3 1 1 1
1 2 2 0 0
3 4 3 1 0