Home > Blockchain >  How to sort DataFrame by string column with repeating values based on own idea of sort in Python Pan
How to sort DataFrame by string column with repeating values based on own idea of sort in Python Pan

Time:10-19

I have DataFrame in Python Pandas like below:

COL1  | COL2 | ...  | COLn
------|------|------|-------
aaa   | AA_x | ...  | ...
bbb   | AA_x |  ... | ...
ggg   | AA_x |  ... | ...
ppp   | AA_x |  ... | ...
aaa   | DD_x |  ... | ...
ggg   | DD_x | ...  |  ...
ppp   | DD_x |  ... | ...
bbb   | DD_x |  ... | ...
....  | ...  | ...  | ...

COL1 is string data type, and I need to sort above DataFrame based values in COL1 on my own idea like: aaa,bbb,ppp,ggg, so as a result I need something like below:

COL2   | COL1  | ...   | COLn
-------|-------|-------|------
AA_x   | aaa   | ...   | ...
AA_x   | bbb   | ...   | ...
AA_x   | ppp   | ...   | ...
AA_x   | ggg   | ...   | ...
DD_x   | aaa   | ...   | ...
DD_x   | bbb   | ...   | ...
DD_x   | ppp   | ...   | ...
DD_x   | ggg   | ...   | ...
...    | ....  |  ...  |...

How can I do that in Python Pandas ? I assume that manually shout be sorted this DataFrame ?

CodePudding user response:

IIUC, you want to sort by COL1 and keep the original order of the relative aaa/bbb/etc.

You can use:

order = ['aaa', 'bbb', 'ppp', 'ggg']

df['COL1'] = pd.Categorical(df['COL1'], categories=order, ordered=True)

out = (df.assign(n=df.groupby('COL1').cumcount())
         .sort_values(by=['n', 'COL1'])
         .drop(columns='n')
       )

If you already have a secondary column to use to sort (eg. COL2):

order = ['aaa', 'bbb', 'ppp', 'ggg']

df['COL1'] = pd.Categorical(df['COL1'], categories=order, ordered=True)

out = df.sort_values(by=['COL2', 'COL1'])
  • Related