I have a dataset with three columns. The first shows the actual data and the second shows the mean of the first. The third is a category. I want to show both the data and the mean in a scatterplot. However, the mean is plotted N times, not only once - making it difficult to control the visualization. Is there a way to plot the mean only once?
Here is a toy example where the orange dots (indicating the mean) are plotted 14 times on top of each other.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({
'Numbers': [10,20,30,40,50,60,11,59,12,58,13,57,25,45],
'Mean': [35,35,35,35,35,35,35,35,35,35,35,35,35,35],
'y': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]
})
fig, ax = plt.subplots(1)
ax.scatter(y="y", x= "Numbers", data=df, s = 200, alpha = .5, label = "Mean response")
ax.scatter(y="y", x= "Mean", data=df, s = 200, alpha = .15, label = "Average mean response")
legend = ax.legend(loc="lower right", framealpha = .4)
CodePudding user response:
Just plot one value:
fig, ax = plt.subplots(1)
ax.scatter(y="y", x="Numbers", data=df, s=200,
alpha=.5, label="Average mean response")
ax.scatter(df['Numbers'].mean(), df['y'].mean(), # plot one value, not 15
s=200, alpha=.15,
label="Average mean response")
legend = ax.legend(loc="lower right", framealpha = .4)
output:
If you have several groups:
ax.scatter(x='Numbers', y='y',
data=df.groupby('y', as_index=False)['Numbers'].mean(),
s=200, alpha=.15, label="Average mean response")
output: