Home > database >  Sharing Y-Axis Range for Multiple Horizontal Bar Issue
Sharing Y-Axis Range for Multiple Horizontal Bar Issue

Time:09-22

I am having issues plotting the total y-axis range of multiple horizontal bar graphs to have all the value align to one another and make only visible the y-labels on the far left plot. I have the dataframe below (data) that I use the pd.Grouper function to group by a time interval before I start creating my figure and axes. Since Im using a dataframe Im assigning a plot to each of the axes' I create. The code doesn't plot the values correctly. If i remove the sharey=y then each plot is displayed correctly but of course not aligned to one common y-axis.

import pandas as pd
import matplotlib.pyplot as plt

#group by time interval
data_gb = data.groupby([pd.Grouper(freq='1min')])
#create and set y-axis range limits from original dataframe
custom_ylim = (data.price.min(), data.price.max())
#number of plots based on number of intervals
numplot = len(data_gb)
# create a tuple of axe names seems like a hack 
axes =  tuple(['ax' str(n) for n in range(1, numplot 1)])
f, axes = plt.subplots(1, numplot, sharey= True, sharex=True)
#iterate and assign plot to each axes
for (t, prices), ax in zip(data_gb, axes):
    ax.set_ylim(custom_ylim) #this doesn't seem to do anything
    prices.plot.barh('price', stacked=True, ax=ax)
    ax.legend_ = None   
plt.show()


timestamp                  price     colA      colB     colC     colD
2021-09-08 13:30:00 00:00  11.00      0.0  140037.0      0.0      0.0
2021-09-08 13:30:00 00:00  11.01  21963.0   34732.0   2961.0   1190.0
2021-09-08 13:30:00 00:00  11.02  17578.0   15434.0  12309.0      2.0
2021-09-08 13:30:00 00:00  11.03   2493.0   12393.0  11229.0    907.0
2021-09-08 13:30:00 00:00  11.04  17240.0   16406.0   1479.0    100.0
...                          ...      ...       ...      ...      ...
2021-09-08 13:31:00 00:00  11.01   8520.0   22579.0   4031.0    248.0
2021-09-08 13:31:00 00:00  11.02  64626.0   10330.0  11340.0   3862.0
2021-09-08 13:31:00 00:00  11.03  10967.0    5144.0   2621.0    640.0
2021-09-08 13:31:00 00:00  11.04  15168.0    2907.0      0.0      4.0
2021-09-08 13:31:00 00:00  11.05   1279.0       0.0      0.0      0.0

Incorrectly plotting. enter image description here

Plotted individual without sharing y-axis. You can see the first graph is missing values in the previous plot. enter image description here

CodePudding user response:

Pandas bar plotting doesn't always work intuitive. This makes sharing axes quite complicated. One problem is that the bars don't get a numerical nor a pure categorical tick position. Instead, the bars are numbered 0,1,2,... and afterwards the ticks get their label.

Another problem is that bars for a numerical column can get a weird conversion to string (e.g. a value 12.34 might get displayed as 12.340000001 due to some floating point weirdness). Some weirdness is visible in your plots, such as 7.4 being displayed instead of 7.40.

The workaround I propose:

  • convert the price column to strings with exactly 2 decimals
  • when plotting, set the price as index and reindex to the full price range; this makes that all subplots get this same range
  • annoyingly, the full price range needs to be calculated before converting the column to string, and then the range also needs to be converted to string
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

# create some test data
times = np.repeat(pd.date_range('2021-09-08 13:30', '2021-09-08 13:34', freq='1min'), 20)
data = pd.DataFrame({'timestamp': times,
                     'price': np.round(np.arange(1100, 1200) / 100 - np.repeat([0, 0.25, 0.5, 0.55, 0.9], 20), 2),
                     'colA': np.random.randint(1000, 5000, 100),
                     'colB': np.random.randint(1000, 5000, 100),
                     'colC': np.random.randint(1000, 5000, 100),
                     'colD': np.random.randint(1000, 5000, 100)}).set_index('timestamp')
# calculate the full price range, first numeric, then convert to string
full_price_range = [f'{x:.2f}' for x in np.arange(data['price'].min(), data['price'].max()   0.0001, 0.01)]
# now convert the price column to strings
data['price'] = data['price'].apply(lambda x: f'{x:.2f}')

data_gb = data.groupby([pd.Grouper(freq='1min')])

numplot = len(data_gb)
fig, axes = plt.subplots(1, numplot, sharey=True, sharex=True, figsize=(12, 4))
for (t, prices), ax in zip(data_gb, axes):
    prices.set_index('price').reindex(full_price_range).plot.barh(stacked=True, legend=False, ax=ax)
    ax.set_title(t)
fig.tight_layout()
plt.show()

pandas barh with shared y-axis

Note that the line axes = tuple(....) in the original code has no effect, because in the next line the variable named axes gets a new value.

  • Related