I have a pandas dataframe like this
import pandas as pd
df = pd.DataFrame({'feature1': ['a', 'b', 'a', 'c', 'b', 'c'],
'feature2': ['b', 'a', 'c', 'a', 'c', 'a'],
'val': [0.96, 0.96, 0.95, 0.95, 0.94, 0.94]})
df
feature1 feature2 corr
0 a b 0.96
1 b a 0.96
2 a c 0.95
3 c a 0.95
4 b c 0.94
5 c a 0.94
I want to sort individual rows by taking the lexicographical minimum value in feature1
and maximum value in feature2
. The resultant dataframe will look like this
feature1 feature2 corr
0 a b 0.96
1 a b 0.96
2 a c 0.95
3 a c 0.95
4 b c 0.94
5 a c 0.94
CodePudding user response:
You can sort per rows:
df[['feature1','feature2']] = np.sort(df[['feature1','feature2']], axis=1)
print (df)
feature1 feature2 val
0 a b 0.96
1 a b 0.96
2 a c 0.95
3 a c 0.95
4 b c 0.94
5 a c 0.94
Or filter feature
columns and get minimal and maximal values:
df[['feature1','feature2']] = df.filter(like='feature').agg(['min','max'], axis=1)
CodePudding user response:
Managed to work this out using df.apply
as well.
def custom_sort(x):
min_val = min(x["feature1"], x["feature2"])
max_val = max(x["feature1"], x["feature2"])
x["feature1"] = min_val
x["feature2"] = max_val
return x
df.apply(custom_sort, axis=1)
> feature1 feature2 corr
0 a b 0.96
1 a b 0.96
2 a c 0.95
3 a c 0.95
4 b c 0.94
5 a c 0.94