How to fix x-axis showing two values with multi-line plot using matplotlib-CodePudding

I am trying to make a line plot showing different categories of one column (type) in a dataframe over the years. I have managed to create the lines, but now my x-axis shows the date, plus one of the types and I don't seem to be able to get rid of this. I want it to ONLY show the year on the x-axis.

Here's what I have so far:

ctdf = df.groupby(["year"])["type"].value_counts().unstack(fill_value=0).stack()
print(ctdf)
from matplotlib import pyplot as plt
fig, ax = plt.subplots()

# key gives the group name (i.e. category), data gives the actual values
for key, data in ctdf.groupby('type'):
    data.plot(x='year', ax=ax, label=key, legend=True)
    plt.xticks(rotation = 45, ha = 'right')
    plt.xlabel('year')

This is ctdf:

year        type           
2002-01-01  Robot              0
            Smart Speaker      0
            Virtual Agent      1
            Voice Assistant    0
2003-01-01  Robot              0
                              ..
2019-01-01  Voice Assistant    0
2020-01-01  Robot              6
            Smart Speaker      2
            Virtual Agent      3
            Voice Assistant    1
Length: 68, dtype: int64

This is the plot I get: plot result showing multiple lines with legend, but messy x-axis

I really don't understand why the x-axis also shows one of the type categories.

CodePudding user response：

Using .reset_index() to 'fill' your 'year' column fixes this.

To show an example with data:

Without reset_index - just to show same effect as in your plot

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
                   'product': ['A', 'A', 'A', 'A', 'A', 'B', 'A', 'B', 'A', 'B','A', 'A', 'A', 'A', 'A'],
                   'sales': [4, 7, 8, 12, 15, 8, 11, 14, 19, 20, 8, 11, 14, 19, 20]})

ctdf = df.groupby('day')['product'].value_counts().unstack(fill_value=0).stack()
ctdf

day  product
1    A          2
     B          1
2    A          3
     B          0
3    A          2
     B          1
4    A          3
     B          0
5    A          2
     B          1
dtype: int64

fig, ax = plt.subplots()

for key, data in ctdf.groupby('product'):
    data.plot(x='day', ax=ax, label=key, legend=True)
    plt.xticks(rotation = 45, ha = 'right')
    plt.xlabel('day')

With reset_index - the expected plot

ctdf = df.groupby('day')['product'].value_counts().unstack(fill_value=0).stack().reset_index()
ctdf

   day product  0
0    1       A  2
1    1       B  1
2    2       A  3
3    2       B  0
4    3       A  2
5    3       B  1
6    4       A  3
7    4       B  0
8    5       A  2
9    5       B  1

fig, ax = plt.subplots()

fig_labels = []

for key, data in ctdf.groupby('product'):
    # print(key)
    # print(data)
    fig_labels.append(key)  # gather the labels in a list
    data.plot(x='day', ax=ax)

plt.xticks(rotation = 45, ha = 'right')  # doesn't need to be inside the loop
plt.xlabel('day')
plt.legend(fig_labels)

Note: (thanks for the feedback in the comments)

It seems the dataframe.plot function always falls back to the column name and ignores the explicite label= setting
- activate the # print statements in the loop to see that key is A & B, but the column name is 0
Actually I don't know the details for that behaviour, but a workaround is to gather the labels in a list and then use matplotlibs plt.legend explicitly - as done in the code.
- best I could find about that issue is github pandas.DataFrame.plot(): Labels do not appear in legend #9542, it seems that's not sorted completely