Home > Mobile >  Connecting pairs of dots on the scatterplot with jitter with lines - Python
Connecting pairs of dots on the scatterplot with jitter with lines - Python

Time:02-03

I have two groups of points, but they also overlap, so I need to add jitter if I plot them with a scatterplot. I also want to connect matching points from each group (they all have a pair).

There are many questions that suggest:

data = [['abc', 'pre', 10], ['abc', 'post', 5], ['bce', 'pre', 10], ['bce', 'post', 5], ['cef', 'pre', 8], ['cef', 'post', 5]]
df = pd.DataFrame(data, columns=['ID', 'time', 'value'])
grouped = df.groupby('ID')

for name, group in grouped:
    sns.scatterplot(x='time', y='value', data=group, color='#3C74BC')
    sns.lineplot(x='time', y='value', data=group, color='#3C74BC')
plt.show()

It works ok, but it doesn't have jitter. If I add jitter via sns. stripplot(), the lines do not connect dots anymore and they are coming out of arbitrary places.

CodePudding user response:

The approach below makes following changes:

  • Convert the time to numeric (0 for 'pre' and 1 for 'post') via (df['time'] != 'pre').astype(float)
  • Add a random jitter to these values: np.random.uniform(-0.1, 0.1, len(df)). Depending on how many values you have, you might change 0.1 to a larger value.
  • Use sns.lineplot with a marker to avoid the need of scatterplot.
  • Use hue='ID' to draw everything in one go.
  • As hue doesn't look to color=, use palette= with the same number of colors as there are different hue values.
  • Suppress the legend, as all hue values have the same color.
  • Assign tick labels to 0 and 1.
  • Set xlim to so the tick labels are at equal distances to the respective border.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

data = [['abc', 'pre', 10], ['abc', 'post', 5], ['bce', 'pre', 10], ['bce', 'post', 5], ['cef', 'pre', 8], ['cef', 'post', 5]]
df = pd.DataFrame(data, columns=['ID', 'time', 'value'])

df['time'] = (df['time'] != 'pre').astype(float)   np.random.uniform(-0.1, 0.1, len(df))

ax = sns.lineplot(x='time', y='value', data=df, hue='ID', marker='o',
                  palette=['#3C74BC'] * len(df['ID'].unique()), legend=False)
ax.set_xticks([0, 1], ['pre', 'post'])
ax.set_xlim(-0.2, 1.2)
plt.show()

sns.lineplot with jitter

  • Related