How to custom sort a pandas dataframe per row by comparing values in columns?-CodePudding

I have a pandas dataframe like this

import pandas as pd 
df = pd.DataFrame({'feature1': ['a', 'b', 'a', 'c', 'b', 'c'],
                   'feature2': ['b', 'a', 'c', 'a', 'c', 'a'],
                   'val': [0.96, 0.96, 0.95, 0.95, 0.94, 0.94]})
df
    feature1    feature2    corr
0   a   b   0.96
1   b   a   0.96
2   a   c   0.95
3   c   a   0.95
4   b   c   0.94
5   c   a   0.94

I want to sort individual rows by taking the lexicographical minimum value in feature1 and maximum value in feature2. The resultant dataframe will look like this

    feature1    feature2    corr
0   a   b   0.96
1   a   b   0.96
2   a   c   0.95
3   a   c   0.95
4   b   c   0.94
5   a   c   0.94

CodePudding user response：

You can sort per rows:

df[['feature1','feature2']] = np.sort(df[['feature1','feature2']], axis=1)
print (df)
  feature1 feature2   val
0        a        b  0.96
1        a        b  0.96
2        a        c  0.95
3        a        c  0.95
4        b        c  0.94
5        a        c  0.94

Or filter feature columns and get minimal and maximal values:

df[['feature1','feature2']] = df.filter(like='feature').agg(['min','max'], axis=1)

CodePudding user response：

Managed to work this out using df.apply as well.

def custom_sort(x):
    min_val = min(x["feature1"], x["feature2"])
    max_val = max(x["feature1"], x["feature2"])
    x["feature1"] = min_val
    x["feature2"] = max_val
    return x
df.apply(custom_sort, axis=1)
> feature1    feature2    corr
0   a   b   0.96
1   a   b   0.96
2   a   c   0.95
3   a   c   0.95
4   b   c   0.94
5   a   c   0.94