Matplotlib lines appear not to be accurate when plotting to Pdf-CodePudding

It appears that there are some very small inaccuracies when using plt.plot() for generating Pdfs with Matplotlib. In the following I have some simple examples, where lines plotted with plt.plot() are not aligned with the original data points plotted with plt.scatter(). The differences are small but could still be noticed in papers etc. when looking closely at the Pdf. I am using Matplotlib 3.6.1.

Example 1:

import pandas as pd
import matplotlib as mlp
mlp.use("Agg")
import matplotlib.pyplot as plt

df = pd.read_csv("my_data.csv")

fig = plt.figure(figsize=(1.5,1.5))
plt.plot(df['X'], df['Y'], color='b', linewidth=0.1)
plt.scatter(df['X'], df['Y'], color='k', s=0.05, linewidths=0)
fig.savefig("res.pdf")
fig.savefig("res.png", dpi=5000)

This is the resulting Pdf: Pdf plot 1

Let's zoom in on some details in the Pdf (the black points and blue lines are not aligned): Pdf plot 1 zoomed

The same segment in the Png (everything is aligned): Png plot 1 zoomed

Example 2:

The same effect can be reached with generated data:

import numpy as np
import matplotlib as mlp
mlp.use("Agg")
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(1.0,1.5))
X = np.arange(0,12,0.01)
data = np.sin(X)   np.random.normal(0, 0.005, (len(X),))
plt.plot(X, data, color='b', linewidth=0.06)
plt.scatter(X, data, color='k', s=0.05, linewidths=0)
fig.savefig("res.pdf")

This is the resulting Pdf: Pdf plot 2

Let's zoom in on some details in the Pdf (the black points and blue lines are not aligned): Pdf plot 2 zoomed

Example 3

The mentioned deviations are small, but are actually visible in some real-world examples. In the following plot, I have used plt.fill_between() and plotted the very same lines using plt.plot(). In this case, the inaccuracies of the lines would be directly visible in the Pdf without zooming in: Pdf plot 3

Question:

To me this behavior is quite surprising. The PNG plot (with high DPI) does not show the misalignment. What is going on here? Changing the backend does not seem to improve results. Is there some way of making these plots more 'accurate'?

Similar question: Link

CodePudding user response：

It turns out Matplotlib does a simplification of paths when the figure is small and the data are very dense, see details here and here.

The simplification works by iteratively merging line segments into a single vector until the next line segment's perpendicular distance to the vector (measured in display-coordinate space) is greater than the path.simplify_threshold parameter. Matplotlib currently defaults to a conservative simplification threshold of 1/9.

To improve plotting accuracy in small plots, change the threshold to a small value, e.g.

mlp.rcParams["path.simplify_threshold"] = 0.01

or just turn off path simplification:

mpl.rcParams['path.simplify'] = False