Home > Mobile >  Ordering of categorical data labels from pandas dataframe in plot
Ordering of categorical data labels from pandas dataframe in plot

Time:08-06

I have data in the following format, where the column stage was created by df['stage'] = pd.Categorical(df['stage'], categories=['Undef', 'C', 'A', 'B'], ordered=True), to create an ordering as follows: Undef < C < A < B. When I print(df['stage']), this ordering is confirmed.

     stage                time
0    A     2012-01-26 21:52:56
1    A     2012-01-26 21:53:26
2    A     2012-01-26 21:53:56
3    A     2012-01-26 21:54:26
4    A     2012-01-26 21:54:56
..   ...                 ...
953  B     2012-01-27 05:49:26
954  B     2012-01-27 05:49:56
955  B     2012-01-27 05:50:26
956  C     2012-01-27 05:50:56
957  Undef 2012-01-27 05:51:26

This is how I plot the data:

fig, ax = plt.subplots()
ax.plot(df.time, df.stage)
plt.show()

The stages should be on the y axis and have the specified ordering, but they just appear in the ordering of their first appearance in the dataframe.

What am I doing wrong?

Edit: This is what the plot currently looks like: enter image description here

All I want to do is to change the order of the y-labels to a custom order (and of course the corresponding plotted values).

Edit2: When adding the line df.sort_values(['stage'], inplace=True) this is what happens to the plot - which cannot be accurate as there can only be one stage at a time (although the ylabels are in the correct order now)

enter image description here

CodePudding user response:

Please find below the code to do what you are looking for. As the dates are in the x-axis, you will need to convert the labels to integers and back. So, first the dictionary is defined to show mapping of Undef, A, B, C as per your requirement. A list with the converted values is created and plot drawn. Finally, the updated y-axis ticklabels are done...

My data (input)

stage   time
0   A   2012-01-26 21:52:56
1   A   2012-01-26 21:53:26
2   A   2012-01-26 21:53:56
3   A   2012-01-26 21:54:26
4   A   2012-01-26 21:55:56
5   B   2012-01-26 21:56:56
6   B   2012-01-26 21:57:56
7   B   2012-01-26 21:58:56
8   B   2012-01-26 21:59:56
9   B   2012-01-26 22:00:56
10  C   2012-01-26 22:01:56
11  C   2012-01-26 22:02:56
12  C   2012-01-26 22:03:56
13  C   2012-01-26 22:04:56
14  Undef   2012-01-26 22:05:56
15  Undef   2012-01-26 22:06:56
16  Undef   2012-01-26 22:07:56

Plot without any sorting (your code equivalent)

enter image description here

Updated code

convert = {"Undef" : 0, "C" : 1, "A" : 2, "B" : 3}
stage_converted = []
for i in df.stage :
    stage_converted.append(conversion[i])

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(df.time, stage_converted)

### Update ticks and labels for y-axis
ax.set_yticks( list(convert.values()) )
ax.set_yticklabels( list(convert.keys()) )

plt.show()

...and finally the updated plot

enter image description here

  • Related