I've managed to bypass this issue by manually setting my xticks. However, I would like to understand the reason behind this 'issue'.
I have the following data:
As you can see, I've calculated the average basket for our weekly campaign. However, if I try plotting it, the first couple dates are changed.
plt.figure(figsize=(16,6))
plt.plot(pd.to_datetime(avg_basket_pbm["trunc_usage_date"]), avg_basket_pbm["average_base_spend"], label="Monday Coupon Users")
plt.plot(pd.to_datetime(avg_basket_reg["TRUNC(EV.EFFECTIVE_DAY,'DD')"]), avg_basket_reg["average_base_spend"], label="Regular Coupon Users")
# plt.xticks(pd.to_datetime(avg_basket_pbm["trunc_usage_date"],format="%Y-%m-%d"), labels = avg_basket_pbm["trunc_usage_date"])
plt.title("Average Basket", weight = "bold", fontsize=14)
plt.legend()
This seems to only influence the first couple dates, the latter ones remain unchanged.
Reproducible Example:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
dates = ['2021-10-11 00:00:00',
'2021-10-18 00:00:00',
'2021-10-25 00:00:00',
'2021-11-01 00:00:00',
'2021-11-08 00:00:00',
'2021-11-15 00:00:00',
'2021-11-22 00:00:00']
stamps = pd.to_datetime(dates)
vals = np.random.random(7)
plt.figure(figsize=(14,6))
plt.plot(stamps, vals)
CodePudding user response:
Matplotlib does not try to match the ticks exactly to your data because in most situations that just looks bad. To illustrate, I adapted your example and added just four datapoints and forced the xticks to match the data:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
dates = ['2021-10-11 00:00:00',
'2021-10-18 00:00:00',
'2021-10-25 00:00:00',
'2021-11-01 00:00:00',
'2021-11-08 00:00:00',
'2021-11-09 00:00:00', # added
'2021-11-10 00:00:00', # added
'2021-11-11 00:00:00', # added
'2021-11-12 00:00:00', # added
'2021-11-15 00:00:00',
'2021-11-22 00:00:00']
stamps = pd.to_datetime(dates)
vals = np.random.random(11)
plt.figure(figsize=(14,6))
plt.xticks(stamps, labels = dates)
plt.plot(stamps, vals)
The tick labels overlap and are unreadable. If you comment the plt.xticks
line out, matplotlib is allowed to "do its own thing" and put the ticks where it considers it "reasonable". In this specific case, it appears to be "weekly, starting from the 1st of the month". If you have a different opinion/algorithm for what you consider "reasonable" tick mark positions, then you have to place them manually.
CodePudding user response:
- The correct way to plot a
pandas.DataFrame
is with- Specifying
xticks=df1.date1
, orxticks=df2.date2
, all of the dates from one DataFrame will be on the x-axis.- The dates in the two DataFrames are not the same.
ax = df1.plot(x='date1', marker='o', figsize=(12, 5)) df2.plot(x='date2', marker='o', ax=ax, xticks=df1.date1, rot=90, grid=True) ax.set_title("Average Basket", weight="bold", fontsize=14)
- Combine the dates from both DataFrames to use as
xticks
# combine the dates to be used for xticks xticks = df1.date1.tolist() df2.date2.tolist() # plot the dataframe ax = df1.plot(x='date1', marker='o', figsize=(12, 5)) df2.plot(x='date2', marker='o', ax=ax, xticks=xticks, rot=90, grid=True) ax.set_title("Average Basket", weight="bold", fontsize=14)
- Specifying