Speed up loop for matplotlib in python-CodePudding

I have a similar, but larger data set with more dates and over ten thousand rows. Usually, it takes 3mins or longer to run the code and plot. I think the problem comes from loop. Looping is time-consuming in python. In this case, would be appreciated if someone knows how to rewrite the code to make it faster.

  data = {'Date' : ["2022-07-01"]*5000   ["2022-07-02"]*5000  ["2022-07-03"]*5000,        
          'OB1' : range(1,15001),
          'OB2' : range(1,15001)}

  df = pd.DataFrame(data)      

  # multi-indexing
  df = df.set_index(['Date'])

  # loop for plot
  i = 1
  fig, axs = plt.subplots(nrows = 1, ncols = 3, sharey = True)
  fig.subplots_adjust(wspace=0) 
  for j, sub_df in df.groupby(level=0):
      plt.subplot(130   i)
      x = sub_df['OB1']
      y = sub_df['OB2']
      plt.barh(x, y) 
      i = i   1
  plt.show()

CodePudding user response：

The slowness comes from the barh function, which involves drawing many rectangles. While your example is already pretty slow (a minute on my laptop), this one runs in less than a second. I replaced barh with fill_betweenx, which fills the area between two curves (here 0 and the height of bars) instead of drawing rectangles. It goes much faster but is not strictly the same. Also, I use the option step=post, so if you zoom, you will have a bar-style graph.

import pandas as pd
import matplotlib.pyplot as plt

data = {
    "Date": ["2022-07-01"] * 5000
      ["2022-07-02"] * 5000
      ["2022-07-03"] * 5000,
    "OB1": range(1, 15001),
    "OB2": range(1, 15001),
}

df = pd.DataFrame(data)

# multi-indexing
df = df.set_index(["Date"])

# loop for plot
i = 1
fig, axs = plt.subplots(nrows=1, ncols=3, sharey=True)
fig.subplots_adjust(wspace=0)
for j, sub_df in df.groupby(level=0):
    plt.subplot(130   i)
    x = sub_df["OB1"]
    y = sub_df["OB2"]
    # plt.barh(x, y)
    plt.fill_betweenx(y, 0, x, step="post")
    i = i   1
plt.show()