I am trying to use matplotlib scatter plot on Python (Jupyter Notebook) to create a t-sne visualization, with different colors for different points.
I am ashamed to admit that I have mostly borrowed prewritten code, so some of the nuance is far beyond me. However, I am running into a ValueError which I can't seem to solve (even after looking at solutions for similar instances of ValueErrors asked here on Stack Overflow).
Running the scatter (relevant code here) returns the ValueError: RGBA sequence should have length 3 or 4; although this is apparently directly caused by the ValueError: 'c' argument has 470000 elements, which is inconsistent with 'x' and 'y' with size 2500.
if __name__ == "__main__":
print("Run Y = tsne.tsne(X, no_dims, perplexity) to perform t-SNE on your dataset.")
print("Running example on ECG samples...")
X = np.loadtxt("ecg_test_tsne_randomremoved_tagremoved.txt")
labels = np.loadtxt("ecg_test_tsne_randomremoved_tagremoved.txt")
Y = tsne(X, 2, 50, 20.0)
pylab.scatter(Y[:, 0], Y[:, 1], 20, labels)
pylab.show()
Here, the txt file is the one that contains all the data. The complete code is verbatim van der Maaten's Python implementation, available here if necessary.
In addition, it states that both the mapping and RGBA conversion failed (pretty severe failure), and one may appreciate verbose feedback.
I'm very confused at this state since even after reading the solutions for other occurrences of this ValueError (as detailed on Stack Overflow), I am a bit clueless about how to format "labels" so that the dimensions of c might match x and y.
CodePudding user response:
The 4th parameter to pyplot.scatter
is a color or set of colors, not a label. scatter
has no parameter for labels. I'd just remove the 4th parameter altogether.