Home > OS >  seaborn: how to reduce wall time for plotting the lineplot for 500k datapoints
seaborn: how to reduce wall time for plotting the lineplot for 500k datapoints

Time:07-12

Total data points are around 500k, I try to create the lineplot to push on the portal. but the graphs takes 30min average wall time to plot the graph. (I'm trying on the jupyter notebook)

%%time
palette = sns.color_palette("hls", 5)
fig, axes = plt.subplots(2,1, sharex=True, figsize=(17,10))
fig.suptitle('Engine Torque & Speed with different dilutions for City')
sns.lineplot(ax=axes[0],hue=df_city['dilution'],x='timestamps', y='sp', data=df_city)
sns.lineplot(ax=axes[1],hue=df_city['dilution'],x='timestamps', y='tq', data=df_city)
plt.show();

Is there any other way to plot the graphs with less wall time? enter image description here

CodePudding user response:

I dont think the amount of data is the problem: see here: Interactive large plot with ~20 million sample points and gigabytes of data

The problem might be your x-axis. From the name 'timestamps' I assume it is a datetime If not already done, convert the type of the 'timestamp' column to 'datetime'.

df_city['datetime'] = pd.to_datetime(df_city['datetime'])

This could solve the problem

CodePudding user response:

I generated a sample dataframe of 500k and it only took about 20 seconds to graph, so there is probably something else going on (not Seaborn). Are you able to downsample the data? Do you NEED all 500k points or can you just look at a general trend of the data. Here is an example of a way to downsample it provided by @ogdenkev

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

sp = np.random.randint(100, size=(500000))
tq = np.random.randint(100, size=(500000))
dilution = [10,20,30,40,50] * 100000
timestamps = np.arange(0,500000)

df_city = pd.DataFrame({"sp":sp, "tq":tq, "dilution":dilution, "timestamps":timestamps})

# Sort the dataframe by dilution so that the downsample affects the dataframe equally
df_city = df_city.sort_values(by=["dilution"]).reset_index(drop=True)

sequence_interval = 0.1
downsampled_interval = 5
step_size = np.round(downsampled_interval / sequence_interval).astype("int")

downsampled_df = df_city.iloc[::step_size, :]

palette = sns.color_palette("hls", 5)
fig, axes = plt.subplots(2,1, sharex=True, figsize=(17,10))
fig.suptitle('Engine Torque & Speed with different dilutions for City')
sns.lineplot(ax=axes[0],hue=df_city['dilution'],x='timestamps', y='sp', data=downsampled_df)
sns.lineplot(ax=axes[1],hue=df_city['dilution'],x='timestamps', y='tq', data=downsampled_df)
plt.show()
  • Related