I have following data:
Date Chemical
10 2021-11-20 21
11 2021-11-26 19
12 2021-11-26 31
13 2021-11-26 32
14 2021-11-27 31
0 2021-12-06 21
6 2021-12-16 23
7 2021-12-16 24
8 2021-12-16 23
9 2021-12-16 25
1 2022-03-07 26
2 2022-03-08 28
3 2022-03-08 29
4 2022-03-08 28
5 2022-03-09 26
I plot column Chemical on y-axis against Date on x-axis:
maindf.boxplot('Chemical', 'Date')
plt.xticks(rotation=40)
plt.show()
I get following plot:
The x-axis is showing date entries to be equidistant and not according to how they are in time. 2021-11-26 and 2021-11-27 should be close together while 2021-12-16 and 2022-03-07 should be far apart.
Where is the problem and how can it be corrected. Thanks for your help.
CodePudding user response:
Matplotlib considers each of the boxes as a categorical entry and separates them equally. To keep them separated based on date, you will need to use the positions
parameters in boxplot available in matplotlib. For this, you need to find the date difference (109 days in your case). Adding that position and providing the number of days from minimum date (20-Nov-2021) will give you the below plot...
This code...
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
fig, ax = plt.subplots(figsize=(25,6))
maindf['Date'] = pd.to_datetime(maindf['Date'])
maindf.sort_values('Date', inplace=True)
##Added column position to DF to show (in int) how many days away each entry is from min-date
maindf['position'] = (maindf['Date'] - maindf.Date.min()).dt.days
##Using positions parameter to define the position
maindf.boxplot('Chemical', 'Date', positions = maindf.position.unique(), ax=ax)
plt.xticks(rotation=40, ha='right')
plt.show()
...will give you this plot
Note that the width is quite large because we have 109 days (max - min date)
There is a large gap in between because of the data. If you don't need this big gap, you can manually change the maindf.position
manually to suit your needs.
If you also want the dates to be equidistant from one another, then you will need to set the ticks to the way you want. I have used 11 dates (10 blocks) as an example. Also, have updated the date to show DD-MM-YYYY format. Changes will be as below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
ax.yaxis.grid(True)
fig, ax = plt.subplots(figsize=(25,6))
maindf=pd.read_excel('myinput.xlsx', 'Sheet60')
maindf['Date'] = pd.to_datetime(maindf['Date'])#, format='%d-%m-%y')
maindf.sort_values('Date', inplace=True)
maindf['position'] = (maindf['Date'] - maindf.Date.min()).dt.days
maindf.boxplot('Chemical', 'Date', positions = maindf.position.unique(), ax=ax)
my_xticklabels = []
for i in range(11):
s = (maindf.Date.min() i * (maindf.Date.max() - maindf.Date.min())/10)
my_xticklabels.append(datetime.datetime.strftime(s, "%d-%m-%Y"))
ax.set_xticks(np.linspace(0,109,11))
ax.set_xticklabels(my_xticklabels, fontsize=14)
plt.xticks(rotation=40, ha='right')
plt.show()
Plot