Home > Software engineering >  How to reorder pandas dataframe based off list containing column order
How to reorder pandas dataframe based off list containing column order

Time:06-03

Say I have a dataframe 'df' that contains a list of files and their contents:

File          Field          Folder
Users.csv       Age      UserFolder
Users.csv      Name      UserFolder
Cars.csv      Color       CarFolder
Cars.csv      Model       CarFolder

How can I reorder this df if I have ordered lists of how the 'Field' column should be ordered?

users_col_order = ['Name', 'Age']
cars_col_order = ['Model', 'Color']

So that the resulting df is re ordered like so (I am not trying to just sort 'Field' in reverse alphabetical order, this example is just coincidence):

File          Field          Folder
Users.csv      Name      UserFolder
Users.csv       Age      UserFolder
Cars.csv      Model       CarFolder
Cars.csv      Color       CarFolder

CodePudding user response:

First, put your new orders in a dictionary:

mapping = {
    'Users': ['Name', 'Age'],
    'Cars': ['Model', 'Color'],
}

Then, create a new column with those values properly positioned according to the File values, and make Field the index and index it with the new column:

original_cols = df.columns

for k, v in mapping.items():
    df.loc[df['File'] == k   '.csv', 'tmp'] = v

df = df.set_index('Field').loc[df['tmp']].reset_index().drop('tmp', axis=1)[original_cols]

Output:

>>> df
        File  Field      Folder
0  Users.csv   Name  UserFolder
1  Users.csv    Age  UserFolder
2   Cars.csv  Model   CarFolder
3   Cars.csv  Color   CarFolder

CodePudding user response:

Use pd.Categorical with ordered=True !

categories = users_col_order   cars_col_order

df['Field'] = pd.Categorical(values = df['Field'],
                             categories = categories, 
                             ordered = True)
df.sort_values(by='Field')

File          Field          Folder
Users.csv      Name      UserFolder
Users.csv       Age      UserFolder
Cars.csv      Model       CarFolder
Cars.csv      Color       CarFolder

If you want to, you can always create a new column Field_categorical to preserve the original values in Field.

CodePudding user response:

#Clothing sizes is a good example for custom sorting order, 
#because XL is at the opposite of XS and not the following one:

#Create DF:
df = pd.DataFrame({
    'cloth_id': [1001, 1002, 1003, 1004, 1005],
    'size': ['S', 'XL', 'M', 'XS', 'L'],
})

#Import this module
from pandas.api.types import CategoricalDtype

#Create and assign your own list order

cat_size_order = CategoricalDtype(
    ['XS', 'S', 'M', 'L', 'XL'], 
    ordered=True
)

# After that, call astype(cat_size_order) to cast the size data to the custom category type.
# By running df['size'], we can see that the size column has been casted to a category type with the order [XS < S < M < L < XL].

df['size'] = df['size'].astype(cat_size_order)

#Apply it :)

df.sort_values('size')
  • Related