Is it possible to order/ran a data frame by a column with an specific order, say I have
col1 col2
v_1 4
v_2 3
v_3 1
And say you want to order as:
col1 col2
v_3 1
v_1 4
v_2 3
As I wanted to order by [3,1,2] in column col1. Simplified example as my df has 42 rows. What I expected is being able to pass a list of values [3,1,2] and sort col1 based on such indicators, so 1 refers to v_1, etc
CodePudding user response:
One way is to create a temporary column order
and have the index of item in list from the column col1
, then sort dataframe on this temporary column, and remove the column before returning the dataframe.
Something like this:
def sortDF(df, lst, colName='col1'):
df['order'] = df[colName].apply(lambda x: lst.index(x))
return df.sort_values(['order']).drop(columns=['order'])
SAMPLE OUTPUT:
>>> df
col1 col2
0 1 4
1 2 3
2 3 1
>>> sortDF(df, [3,1,2], 'col1')
col1 col2
2 3 1
0 1 4
1 2 3
PS: The method above expects that all the values in col1
also exist in the list, otherwise it will throw IndexError
, you can handle that scenario manually if that's not the case, and represent it by NaN
on order
column, then you can use parameter na_position
while sorting the dataframe.
CodePudding user response:
One way is to create a Multiindex, based on a mapping, select the rows, via loc, and drop the extra index after:
mapping = [3, 1, 2]
# create pairing between `col1` and range of numbers :
mapp = pd.Series(df.col1.array, index = range(1, df.col1.size 1))
# append mapped index to df
(df.set_index(df.col1.map(mapp), append=True)
.swaplevel() # move index to the front
.loc[mapping] # reshape order
.droplevel(0) # purpose served, drop it
)
col1 col2
2 v_3 1
0 v_1 4
1 v_2 3