This may sound silly, but I just can't seem to figure it out. I have a Pandas dataframe like this:
N1 N2 N3 N4 N5
0 48 20 45 21 12
1 32 16 29 41 36
2 41 42 34 13 9
3 39 37 4 7 33
4 32 3 1 39 21
... ... ... ... ... ...
1313 1 5 27 36 42
1314 18 20 35 38 48
1315 12 34 37 38 42
1316 18 23 37 41 42
1317 2 10 18 34 35
and I want to sort each row so that the row is re-arranged from min to max. I don't want the column labels to change. ie it looks like this:
N1 N2 N3 N4 N5
0 48 45 21 20 12
1 41 32 36 29 16
2 42 41 34 13 9
I've tried a for loop with iloc, running through the index,one row at a time, applying sort_values, but it doesn't work. Any help?
CodePudding user response:
You can sorting rows by numpy.sort
, swap ordering for descending order by [:, ::-1]
and pass to DataFrame constructor if performance is important:
df = pd.DataFrame(np.sort(df, axis=1)[:, ::-1],
columns=df.columns,
index=df.index)
print (df)
N1 N2 N3 N4 N5
0 48 45 21 20 12
1 41 36 32 29 16
2 42 41 34 13 9
3 39 37 33 7 4
4 39 32 21 3 1
1313 42 36 27 5 1
1314 48 38 35 20 18
1315 42 38 37 34 12
1316 42 41 37 23 18
1317 35 34 18 10 2
A bit worse performance if assign back:
df[:] = np.sort(df, axis=1)[:, ::-1]
Performance:
#10k rows
df = pd.concat([df] * 1000, ignore_index=True)
#Ynjxsjmh sol
In [200]: %timeit df.apply(lambda row: list(reversed(sorted(row))), axis=1, result_type='expand')
595 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Andrej Kesely sol1
In [201]: %timeit df[:] = np.fliplr(np.sort(df, axis=1))
559 µs ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Andrej Kesely sol2
In [202]: %timeit df.loc[:, ::-1] = np.sort(df, axis=1)
518 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#jezrael sol2
In [203]: %timeit df[:] = np.sort(df, axis=1)[:, ::-1]
491 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#jezrael sol1
In [204]: %timeit pd.DataFrame(np.sort(df, axis=1)[:, ::-1], columns=df.columns, index=df.index)
399 µs ± 2.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
CodePudding user response:
You can try apply
on rows with result_type
expand
or broadcast
df = df.apply(lambda row: list(reversed(sorted(row))), axis=1, result_type='expand')
print(df)
0 1 2 3 4
0 48 45 21 20 12
1 41 36 32 29 16
2 42 41 34 13 9
3 39 37 33 7 4
4 39 32 21 3 1
CodePudding user response:
Try np.sort
:
df[:] = np.fliplr(np.sort(df, axis=1))
print(df)
Prints:
N1 N2 N3 N4 N5
0 48 45 21 20 12
1 41 36 32 29 16
2 42 41 34 13 9
3 39 37 33 7 4
4 39 32 21 3 1
Or:
df.loc[:, ::-1] = np.sort(df, axis=1)