I have the following pandas dataframe:
col1 col2 col3 .... colN
5 2 4 .... 9
1 2 3 .... 9
7 1 4 .... 0
1 4 7 .... 8
What I need is a way to determinate the order between several columns:
col1 col2 col3 .... colN
5 2 4 .... 9 ----> colN >= ... >= col5 >= col2 >= col3
1 2 3 .... 9 ----> colN >= ... >= col3 >= col2 >= col1
7 1 4 .... 0 ----> col1 >= ... >= col3 >= col2 >= colN
1 4 7 .... 8 ----> colN >= ... >= col3 >= col2 >= col1
And give them a numeric alias. For example:
colN >= ... >= col5 >= col2 >= col3 = X
colN >= ... >= col3 >= col2 >= col1 = Y
col1 >= ... >= col3 >= col2 >= colN = Z
:
:
col1 col2 col3 .... colN order
5 2 4 .... 9 X
1 2 3 .... 9 Y
7 1 4 .... 0 Z
1 4 7 .... 8 Y
:
:
The number of columns may change and the alias has to follow a patron. Example with 3 columns:
col1 >= col2 >= col3 = 1
col1 >= col3 >= col2 = 2
col2 >= col1 >= col3 = 3
col2 >= col3 >= col2 = 4
col3 >= col1 >= col2 = 5
col3 >= col2 >= col1 = 6
Thanks and regards
CodePudding user response:
You can use:
df['order'] = df.apply(lambda x: '>='.join(x.sort_values(ascending=False).index), axis=1)
df['alias'] = df.groupby('order').ngroup() 1
Input
col1 col2 col3
0 5 2 4
1 1 2 3
2 7 1 4
3 1 4 7
Output:
col1 col2 col3 order alias
0 5 2 4 col1>=col3>=col2 1
1 1 2 3 col3>=col2>=col1 2
2 7 1 4 col1>=col3>=col2 1
3 1 4 7 col3>=col2>=col1 2
Or for specific pattern:
alias_pattern = {'col1>=col3>=col2' : 2, 'col3>=col2>=col1' : 5}
df['alias'] = df['order'].map(alias_pattern)
Output:
col1 col2 col3 order alias
0 5 2 4 col1>=col3>=col2 2
1 1 2 3 col3>=col2>=col1 5
2 7 1 4 col1>=col3>=col2 2
3 1 4 7 col3>=col2>=col1 5