I have DataFrame in Python Pandas like below:
COL1 | COL2 | ... | COLn
------|------|------|-------
aaa | AA_x | ... | ...
bbb | AA_x | ... | ...
ggg | AA_x | ... | ...
ppp | AA_x | ... | ...
aaa | DD_x | ... | ...
ggg | DD_x | ... | ...
ppp | DD_x | ... | ...
bbb | DD_x | ... | ...
.... | ... | ... | ...
COL1 is string data type, and I need to sort above DataFrame based values in COL1 on my own idea like: aaa,bbb,ppp,ggg
, so as a result I need something like below:
COL2 | COL1 | ... | COLn
-------|-------|-------|------
AA_x | aaa | ... | ...
AA_x | bbb | ... | ...
AA_x | ppp | ... | ...
AA_x | ggg | ... | ...
DD_x | aaa | ... | ...
DD_x | bbb | ... | ...
DD_x | ppp | ... | ...
DD_x | ggg | ... | ...
... | .... | ... |...
How can I do that in Python Pandas ? I assume that manually shout be sorted this DataFrame ?
CodePudding user response:
IIUC, you want to sort by COL1 and keep the original order of the relative aaa/bbb/etc.
You can use:
order = ['aaa', 'bbb', 'ppp', 'ggg']
df['COL1'] = pd.Categorical(df['COL1'], categories=order, ordered=True)
out = (df.assign(n=df.groupby('COL1').cumcount())
.sort_values(by=['n', 'COL1'])
.drop(columns='n')
)
If you already have a secondary column to use to sort (eg. COL2):
order = ['aaa', 'bbb', 'ppp', 'ggg']
df['COL1'] = pd.Categorical(df['COL1'], categories=order, ordered=True)
out = df.sort_values(by=['COL2', 'COL1'])