I have data in the following format, where the column stage was created by
df['stage'] = pd.Categorical(df['stage'], categories=['Undef', 'C', 'A', 'B'], ordered=True)
, to create an ordering as follows: Undef < C < A < B
. When I print(df['stage'])
, this ordering is confirmed.
stage time
0 A 2012-01-26 21:52:56
1 A 2012-01-26 21:53:26
2 A 2012-01-26 21:53:56
3 A 2012-01-26 21:54:26
4 A 2012-01-26 21:54:56
.. ... ...
953 B 2012-01-27 05:49:26
954 B 2012-01-27 05:49:56
955 B 2012-01-27 05:50:26
956 C 2012-01-27 05:50:56
957 Undef 2012-01-27 05:51:26
This is how I plot the data:
fig, ax = plt.subplots()
ax.plot(df.time, df.stage)
plt.show()
The stages should be on the y axis and have the specified ordering, but they just appear in the ordering of their first appearance in the dataframe.
What am I doing wrong?
Edit: This is what the plot currently looks like:
All I want to do is to change the order of the y-labels to a custom order (and of course the corresponding plotted values).
Edit2: When adding the line df.sort_values(['stage'], inplace=True)
this is what happens to the plot - which cannot be accurate as there can only be one stage at a time (although the ylabels are in the correct order now)
CodePudding user response:
Please find below the code to do what you are looking for. As the dates are in the x-axis, you will need to convert the labels to integers and back. So, first the dictionary is defined to show mapping of Undef, A, B, C as per your requirement. A list with the converted values is created and plot drawn. Finally, the updated y-axis ticklabels are done...
My data (input)
stage time
0 A 2012-01-26 21:52:56
1 A 2012-01-26 21:53:26
2 A 2012-01-26 21:53:56
3 A 2012-01-26 21:54:26
4 A 2012-01-26 21:55:56
5 B 2012-01-26 21:56:56
6 B 2012-01-26 21:57:56
7 B 2012-01-26 21:58:56
8 B 2012-01-26 21:59:56
9 B 2012-01-26 22:00:56
10 C 2012-01-26 22:01:56
11 C 2012-01-26 22:02:56
12 C 2012-01-26 22:03:56
13 C 2012-01-26 22:04:56
14 Undef 2012-01-26 22:05:56
15 Undef 2012-01-26 22:06:56
16 Undef 2012-01-26 22:07:56
Plot without any sorting (your code equivalent)
Updated code
convert = {"Undef" : 0, "C" : 1, "A" : 2, "B" : 3}
stage_converted = []
for i in df.stage :
stage_converted.append(conversion[i])
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(df.time, stage_converted)
### Update ticks and labels for y-axis
ax.set_yticks( list(convert.values()) )
ax.set_yticklabels( list(convert.keys()) )
plt.show()
...and finally the updated plot