I am making a scatter plot with the geyser dataset from seaborn. I am coloring the points based on the 'kind' column but for some reason, the legend only shows 'long' but leaves out 'short'. I don't know what I am missing. I also was wondering if there is a simpler way to color code the data one that does not use a for-loop. Thanks!
x = geyser_df['waiting']
y = geyser_df['duration']
col = []
for i in range(len(geyser_df)):
if (geyser_df['kind'][i] == 'short'):
col.append('MediumVioletRed')
elif(geyser_df['kind'][i] == 'long'):
col.append('Navy')
plt.scatter(x, y, c=col)
plt.legend(('long','short'))
plt.xlabel('Waiting')
plt.ylabel("Duration")
plt.suptitle("Waiting vs Duration")
plt.show()
CodePudding user response:
You're passing x = geyser_df ['waiting']
and y = geyser_df ['duration']
as a single dataset which causes plt.scatter
to only use as label="long"
as legend (one per dataset). I don't have enough experience using this type of libraries but to reproduce the example you describe you need to write a program like this:
long = [[], []]
short = [[], []]
col=['MediumVioletRed', 'Navy']
for i in range(len(geyser_df["kind"])):
if (geyser_df["kind"][i] == "long"):
long[0].append([geyser_df['waiting'][i]])
long[1].append([geyser_df['duration'][i]])
else:
short[0].append([geyser_df['waiting'][i]])
short[1].append([geyser_df['duration'][i]])
plt.scatter(long[0], long[1], c=col[1], label="long")
plt.scatter(short[0], short[1], c=col[0], label="short")
plt.legend()
plt.xlabel('Waiting')
plt.ylabel("Duration")
plt.suptitle("Waiting vs Duration")
plt.show()
CodePudding user response:
- The correct way to do this with pandas is with
- The easiest way is with
seaborn
, which is a high-level API for matplotlib
fig, ax = plt.subplots(figsize=(6, 4)) sns.scatterplot(data=df, x='waiting', y='duration', hue='kind', palette=['Navy', 'MediumVioletRed'], ax=ax)
sns.relplot(data=df, x='waiting', y='duration', hue='kind', palette=['Navy', 'MediumVioletRed'], height=4, aspect=1.5)
- The easiest way is with