Home > Software design >  Sort columns based on min value of each column in pandas
Sort columns based on min value of each column in pandas

Time:08-26

I am working with a dataset that is of the form

stops   stops name    0            1           2           3
1           A       21:09:00    17:24:00    17:54:00    17:29:00
2           B       21:10:00    17:25:00    17:55:00    17:27:00
3           C                   17:28:00    17:58:00    17:26:00
4           D       21:16:00                18:01:00    17:23:00
5           E       21:17:00    17:32:00    18:02:00    
6           F       21:20:00    17:35:00    18:05:00    17:20:00

I know how to sort columns [0-3] according to the times of a specific stop. For example, to sort based on the times of the first stop(A) I do below:

def time_to_seconds(time):
    try:
        h,m,s = time.split(':')
        return int(h) * 3600   int(m) * 60   int(s)
    except:
        return -1

def time_cmp(l):
    return [ time_to_seconds(time) for time in l ]

df[df.columns[2:]] = df[df.columns[2:]].sort_values(by=[0],axis=1,key=time_cmp)

It works, and values are stored:

stops   stops name    0            1           2           3
1           A       17:24:00    17:29:00    17:54:00    21:09:00
2           B       17:25:00    17:27:00    17:55:00    21:10:00
3           C       17:28:00    17:26:00    17:58:00            
4           D                   17:23:00    18:01:00    21:16:00
5           E       17:32:00                18:02:00    21:17:00
6           F       17:35:00    17:20:00    18:05:00    21:20:00

However, I want to sort the columns based on the minimum value for each column, not a specific row(or stop). What should I do?

Result of the desired sort:

stops   stops name    0            1           2           3
1           A       17:29:00    17:24:00    17:54:00    21:09:00
2           B       17:27:00    17:25:00    17:55:00    21:10:00
3           C       17:26:00    17:28:00    17:58:00            
4           D       17:23:00                18:01:00    21:16:00
5           E                   17:32:00    18:02:00    21:17:00
6           F       17:20:00    17:35:00    18:05:00    21:20:00

As you can see min value of column 0(17:20:00) is less than the min value of column 1(17:24:00).

CodePudding user response:

You can convert to_timedelta, then use numpy to get the sorted order:

import numpy as np
cols = df.columns[2:]
order = np.argsort(df[cols].apply(pd.to_timedelta).min())
df[cols].iloc[:, order]
# or as one-liner
# df[cols].iloc[:, np.argsort(df[cols].apply(pd.to_timedelta).min())]

output:

          3         1         2         0
0  17:29:00  17:24:00  17:54:00  21:09:00
1  17:27:00  17:25:00  17:55:00  21:10:00
2  17:26:00  17:28:00  17:58:00       NaN
3  17:23:00       NaN  18:01:00  21:16:00
4       NaN  17:32:00  18:02:00  21:17:00
5  17:20:00  17:35:00  18:05:00  21:20:00

If you want to replace the values, keeping the column names intact:

cols = df.columns[2:]
order = np.argsort(df[cols].apply(pd.to_timedelta).min())
df[cols] = df[cols].iloc[:, order]

output:

   stops stops name         0         1         2         3
0      1          A  17:29:00  17:24:00  17:54:00  21:09:00
1      2          B  17:27:00  17:25:00  17:55:00  21:10:00
2      3          C  17:26:00  17:28:00  17:58:00       NaN
3      4          D  17:23:00       NaN  18:01:00  21:16:00
4      5          E       NaN  17:32:00  18:02:00  21:17:00
5      6          F  17:20:00  17:35:00  18:05:00  21:20:00

CodePudding user response:

Here is how you can do it-

my_df = pd.DataFrame({'stops':[1,2,3,4,5,6],
                      'stops name':['A','B','C','D','E','F'],
                      '0':['17:29:00','17:27:00','17:26:00','17:23:00','','17:20:00'],
                     '1':['17:24:00','17:25:00','17:28:00','','17:32:00','17:35:00'],
                     '2':['17:54:00','17:55:00','17:58:00','18:01:00','18:02:00','18:05:00'],
                     '3':['21:09:00','21:10:00','','21:16:00','21:17:00','21:20:00']})

my_df = my_df.sort_values(by=['0','1','2','3'], ascending=[False, False, False, False])
my_df

Output-

    stops   stops name  0   1   2   3
0   1   A   17:29:00    17:24:00    17:54:00    21:09:00
1   2   B   17:27:00    17:25:00    17:55:00    21:10:00
2   3   C   17:26:00    17:28:00    17:58:00    
3   4   D   17:23:00        18:01:00    21:16:00
5   6   F   17:20:00    17:35:00    18:05:00    21:20:00
4   5   E       17:32:00    18:02:00    21:17:00

enter image description here

  • Related