Home > OS >  Order dataframe by column with specific order from list
Order dataframe by column with specific order from list

Time:10-28

Is it possible to order/ran a data frame by a column with an specific order, say I have

col1 col2
v_1  4
v_2  3
v_3  1 

And say you want to order as:

col1 col2
v_3  1
v_1  4
v_2  3

As I wanted to order by [3,1,2] in column col1. Simplified example as my df has 42 rows. What I expected is being able to pass a list of values [3,1,2] and sort col1 based on such indicators, so 1 refers to v_1, etc

CodePudding user response:

One way is to create a temporary column order and have the index of item in list from the column col1, then sort dataframe on this temporary column, and remove the column before returning the dataframe.

Something like this:

def sortDF(df, lst, colName='col1'):
    df['order'] = df[colName].apply(lambda x: lst.index(x))
    return df.sort_values(['order']).drop(columns=['order'])

SAMPLE OUTPUT:

>>> df
   col1  col2
0     1     4
1     2     3
2     3     1

>>> sortDF(df, [3,1,2], 'col1')
   col1  col2
2     3     1
0     1     4
1     2     3

PS: The method above expects that all the values in col1 also exist in the list, otherwise it will throw IndexError, you can handle that scenario manually if that's not the case, and represent it by NaN on order column, then you can use parameter na_position while sorting the dataframe.

CodePudding user response:

One way is to create a Multiindex, based on a mapping, select the rows, via loc, and drop the extra index after:

mapping = [3, 1, 2]

# create pairing between `col1` and range of numbers :
mapp = pd.Series(df.col1.array, index = range(1, df.col1.size   1))

# append mapped index to df
(df.set_index(df.col1.map(mapp), append=True)
   .swaplevel() # move index to the front
   .loc[mapping] # reshape order
   .droplevel(0) # purpose served, drop it
)
  col1  col2
2  v_3     1
0  v_1     4
1  v_2     3
  • Related