I would like to re-index the columns of a dataframe by sorting them by both alphabetical order and numerical order.
In my dataframe df, my columns are named:
Round 1 Position, Round 2 Position, Round 1 Score, Round 2 Score, Round 1 Price, Round 2 Price, ..., Round 10 Position, Round 10 Score, Round 10 Price
I would like them to be ordered as follows:
Round 1 Position, Round 1 Price, Round 1 Score, Round 2 Position, Round 2 Price, Round 2 Score, Round 3 Position, Round 3 Price, Round 3 Score, ..., Round 10 Position, Round 10 Price, Round 10 Score
I have tried the following:
df.reindex(sorted(df.columns, key=lambda x: float(x[1:])), axis=1)
but had no luck as the column name was a string.
Thank you for taking the time to read through my question and help me. Much appreciated!
CodePudding user response:
Once you sort the column names into their natural order, you can use []
indexing to re-order the columns.
import pandas as pd
import natsort
df = pd.DataFrame({'A2': [1, 2, 3],
'B1': [4, 5, 6],
'A1': [7, 8, 9]})
# sort the column names
col_names = df.columns
sorted_col_names = natsort.natsorted(col_names)
# re-order the columns
df = df[sorted_col_names]
print(df)
Output:
A1 A2 B1
0 7 1 4
1 8 2 5
2 9 3 6
There are several options for sorting alpha-numeric values in the answers to How to sort alpha numeric set in python. I tested the natsort
library, and it works with your input.
columns = ['Round 1 Position', 'Round 2 Position', 'Round 1 Score', 'Round 2 Score', 'Round 1 Price',
'Round 2 Price', 'Round 10 Position', 'Round 10 Score', 'Round 10 Price']
columns_s = natsort.natsorted(columns)
print(columns_s)
Output:
['Round 1 Position', 'Round 1 Price', 'Round 1 Score', 'Round 2 Position', 'Round 2 Price', 'Round 2 Score', 'Round 10 Position', 'Round 10 Price', 'Round 10 Score']