Home > Net >  How do I create a clear line plot with a large number of values (Pandas)?
How do I create a clear line plot with a large number of values (Pandas)?

Time:07-25

I have a pandas DataFrame of 8664 rows which contains the following columns of importance to me: EASTVEL , NORTHVEL, Z_NAP, DATE TIME. Definitions of the columns are:

  • EASTVEL = Flow of current where min(-) values are west and plus( ) values are east.
  • NORTHVEL = Flow of current where min(-) values are south and plus( ) values are north.
  • Z_NAP = Depth of water
  • DATE TIME = Date time in this format: 2021-11-17 10:00:00

Now the problem that I encouter is the following: I want to generate a plot with EASTVEL on the x-axis and Z_NAP on the y-axis within the timeframe of 2021-11-17 10:00:00 untill 2021-11-17 12:00:00 (I already created a df frame_3_LW that only contains those values). However because I have so many values I get a plot like you see below. However I would like just one line describing the course of EASTVEL against Z_NAP. That way it will be way more clear. Can anybody help me with that?

enter image description here

CodePudding user response:

Well you've already gotten down to the problem itself. Your code is fine, the problem is that you have many points and that this way of visualization doesn't seem to work if the variables change too much or if you have too many points.. You could try plotting every 5th point (or something like that) but I doubt it would enhance the graph.

Even though not directly what you asked, I do suggest you either:

  1. Plot both variables independently on the same graph
  2. Plot the ratio of the two variables. That way you have 1 line only, describing the relationship between the variables.

:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(0)

df = pd.DataFrame({'east_vel': np.random.randint(0, 5, 100),
                   'z_nap': np.random.randint(5, 10, 100),
                   'time': np.arange(100)},
)

# option 1, plot both variables
plt.figure()
plt.plot(df['time'], df['east_vel'], color='blue', label='east_vel')
plt.plot(df['time'], df['z_nap'], color='orange', label='z_nap')
plt.title('Two separate lines')
plt.legend()

# option 2, plot ratio between variables
plt.figure()
plt.plot(df['time'], df['east_vel']/df['z_nap'], label='ast_vel vs. z_nap')
plt.title('Ratio between east_vel and z_nap')
plt.legend()
plt.show()

Here's the output of the code: Plotting two variables independently Plotting the ratio between the two variables

Even for a relatively small amount of points (100), plotting them one against each other would be very messy:

Plotting the variables one against each other

  • Related