I am trying to make a simple scatterplot with Matplotlib. I am passing an Numpy array for x and another one for y:
df = df.to_numpy() # this was originally a pandas DataFrame
print(df)
plt.scatter(df[:,1], df[:,2])
plt.show()
The print outputs:
[['B' '-693.3127738066283' '19.14412552031358']
['B' '-1633.974496310751' '40.13395450795514']
['B' '-2010.8973373308845' '-37.64969595561755']
...
['R' '-1034.7669874549774' '-76.93110447814361']
['R' '745.6579736997674' '-51.74835753276244']
['R' '-1473.8940519681794' '-28.58246870754514']]
However, the plot outputs this:
To give a better view of what's happening, if I plot only the first three datapoints it looks like this:
So the x- and y- coordinates are being "plotted", but the axes have no meaningful scale or value. Why is this happening, and how can I make a regular scatterplot?
CodePudding user response:
As you can see from your prints, df
contains only strings. matplotlib
has no idea what to do with them.
Change:
plt.scatter(df[:,1], df[:,2])
With:
plt.scatter(df[:,1].astype(float), df[:,2].astype(float))
What it does is converting the arrays to numeric types with which matplotlib
knows how to deal. Of course, this is assuming the arrays contain strings representing valid numbers only.