I have DataFrame in Python Pandas like below:
date_col - in "datetime64" format
sales - in "int64" format
date_col sales 2019-01-05 100 2019-03-20 500 2019-04-28 290 ... ...
And I need to create Time Series plot and mark in a separate color the 5 days with the highest number of sales.
Currently I have code like below:
df['sales'].plot(linewidth=1.5,
grid = True,
marker="o",
linestyle="-",
markersize=4,
label="Daily sales",
color = "steelblue")
plt.xlabel("date")
plt.ylabel("sales")
plt.legend()
plt.show()
But as a final result I need something like below:
- vertical lines representing 5 days with the highest sales
- date in year month day format for those 5 days with the highest sales
How can I do that in Python ? How can I modify my code or do that in other way?
mrCopiCat I used your code and I have result like below, why ?
CodePudding user response:
well, you can use the ax.annotate
function built in matplotlib
with the ax.vlines
. here's an example with 5 maximums (I did use simple int values for date (for the sake of the example) but it'll surely work with your datetime values):
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
# init figure and axis
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set(title="maximum sales ...")
# sample df
data = {'date_col': list(range(20)), 'sales': [random.randint(1, 30) for _ in range(20)]}
df = pd.DataFrame.from_dict(data).set_index('date_col')
# plotting the df
df.plot(ax=ax)
# adding the lines
num_max = 5 # change this if you want more or less points
for row in df.sort_values(by='sales', ascending=False).iloc[:num_max].iterrows():
print(row[0], row[1])
xmax, ymax = row[0], row[1]
ax.vlines(xmax, 0, ymax, color="tab:red")
ax.annotate(f'{xmax}', xy=(xmax, ymax), xytext=(xmax, ymax 1), color="tab:red")
# setting size limit and plotting
ax.set_ylim(0,40) # change or remove that too
plt.show()
output: