Home > database >  Having an error message when trying to sort a dataset in customized list
Having an error message when trying to sort a dataset in customized list

Time:11-26

I'm using python to organize an imported csv file. the dataset I have looks like this

  Name      Style      ID
0  heels    High end     1
1  sneaker    Middle     0
2  top      High end     3
3  skirt     Low end     6
4  dress    High end     4
5  sweater   Low end     9
6  hat        N/A.       2
..

I am trying to arrange it so that I have have the dataset sorted like this where High end, Middle and Low are all arranged first, and other styles follow

  Name      Style      ID
0  heels    High end     1
1  sneaker  High end     3
2  top      High end     4
3  skirt      Middle     0
4  dress     Low end     6
5  sweater   Low end     9
6  hat        N/A.       2
...

I tried this code

1 sort_order = {'High End':0,
2               'Middle':1, 'Low end':2,}
3 Clothing_Df['Style'].apply(lambda x: sort_order[x])

I get an error
---> 3 Clothing_Df['Style'].apply(lambda x: sort_order[x])

TypeError: list indices must be integers or slices, not str

I've also tried:

1 sortlist = ['High End':0,
2             'Middle':1, 'Low end':2,]
3 sorted(Clothing_Df['Style'], key= sortlist)

returns the same Typeerror

I am not sure how to best tackle this problem as it is a very large dataset and I simply need to figure out how to custom sort my data. Any help needed thank you

CodePudding user response:

use pd.Categorical to specify the order.

style_list = df['Style'].unique()
sort_order = sorted(style_list, key=lambda x: (x == 'High end', x == 'Middle', x == 'Low end'), reverse=True)
df['Style'] = pd.Categorical(df['Style'], categories=sort_order, ordered=True)
df.sort_values('Style', inplace=True)

output:

> df

      Name     Style  ID
0    heels  High end   1
2      top  High end   3
4    dress  High end   4
1  sneaker    Middle   0
3    skirt   Low end   6
5  sweater   Low end   9
6      hat      N/A.   2
7   jacket     Other  10
  • Related