I am trying to make a line plot showing different categories of one column (type) in a dataframe over the years. I have managed to create the lines, but now my x-axis shows the date, plus one of the types and I don't seem to be able to get rid of this. I want it to ONLY show the year on the x-axis.
Here's what I have so far:
ctdf = df.groupby(["year"])["type"].value_counts().unstack(fill_value=0).stack()
print(ctdf)
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
# key gives the group name (i.e. category), data gives the actual values
for key, data in ctdf.groupby('type'):
data.plot(x='year', ax=ax, label=key, legend=True)
plt.xticks(rotation = 45, ha = 'right')
plt.xlabel('year')
This is ctdf:
year type
2002-01-01 Robot 0
Smart Speaker 0
Virtual Agent 1
Voice Assistant 0
2003-01-01 Robot 0
..
2019-01-01 Voice Assistant 0
2020-01-01 Robot 6
Smart Speaker 2
Virtual Agent 3
Voice Assistant 1
Length: 68, dtype: int64
This is the plot I get:
I really don't understand why the x-axis also shows one of the type categories.
CodePudding user response:
Using .reset_index()
to 'fill' your 'year' column fixes this.
To show an example with data:
Without reset_index - just to show same effect as in your plot
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
'product': ['A', 'A', 'A', 'A', 'A', 'B', 'A', 'B', 'A', 'B','A', 'A', 'A', 'A', 'A'],
'sales': [4, 7, 8, 12, 15, 8, 11, 14, 19, 20, 8, 11, 14, 19, 20]})
ctdf = df.groupby('day')['product'].value_counts().unstack(fill_value=0).stack()
ctdf
day product
1 A 2
B 1
2 A 3
B 0
3 A 2
B 1
4 A 3
B 0
5 A 2
B 1
dtype: int64
fig, ax = plt.subplots()
for key, data in ctdf.groupby('product'):
data.plot(x='day', ax=ax, label=key, legend=True)
plt.xticks(rotation = 45, ha = 'right')
plt.xlabel('day')
With reset_index - the expected plot
ctdf = df.groupby('day')['product'].value_counts().unstack(fill_value=0).stack().reset_index()
ctdf
day product 0
0 1 A 2
1 1 B 1
2 2 A 3
3 2 B 0
4 3 A 2
5 3 B 1
6 4 A 3
7 4 B 0
8 5 A 2
9 5 B 1
fig, ax = plt.subplots()
fig_labels = []
for key, data in ctdf.groupby('product'):
# print(key)
# print(data)
fig_labels.append(key) # gather the labels in a list
data.plot(x='day', ax=ax)
plt.xticks(rotation = 45, ha = 'right') # doesn't need to be inside the loop
plt.xlabel('day')
plt.legend(fig_labels)
Note: (thanks for the feedback in the comments)
- It seems the
dataframe.plot
function always falls back to the column name and ignores the explicitelabel=
setting- activate the
# print
statements in the loop to see thatkey
is A & B, but the column name is 0
- activate the
- Actually I don't know the details for that behaviour, but a workaround is to gather the labels in a list and then use matplotlibs
plt.legend
explicitly - as done in the code.- best I could find about that issue is github pandas.DataFrame.plot(): Labels do not appear in legend #9542, it seems that's not sorted completely