df = pd.DataFrame(np.random.randint(0,100,size=(15, 3)), columns=list('NMO'))
df['Catgeory1'] = ['I','I','I','I','I','G','G','G','G','G','P','P','I','I','P']
df['Catgeory2'] = ['W','W','C','C','C','W','W','W','W','W','O','O','O','O','O']
Imagining this df is much larger with many more categories, how might I sort the list, retaining all the characteristics of any given row, by a determined order. Ex. Sorting the df only by 'category1', such that all the P's are first, the I's, then G's.
CodePudding user response:
df.sort_values('Catgeory1',ascending=False)
CodePudding user response:
You can use categorical type:
cat_type = pd.CategoricalDtype(categories=["P", "I", "G"], ordered=True)
df['Category1'] = df['Category1'].astype(cat_type)
print(df.sort_values(by='Category1'))
Prints:
N M O Category1 Category2
10 49 37 44 P O
11 72 64 66 P O
14 39 98 32 P O
0 93 12 89 I W
1 20 74 21 I W
2 25 22 24 I C
3 47 11 33 I C
4 60 16 34 I C
12 0 90 6 I O
13 13 35 80 I O
5 84 64 67 G W
6 70 47 83 G W
7 61 57 76 G W
8 19 8 3 G W
9 7 8 5 G W