Get number of Clusters (3D)-CodePudding

I have a question about clustering. When you're using k-nearest neighbour algorithm, you have to say, how many clusters you're expecting. My problem is now, that I have some runs, where the number of clusters varies. I checked, and there are some methods how you can restrict, how many clusters you have, but these algorithms work for a two-dimensional problem. In my case, I have three features. Do you have an idea, of what algorithms I can use for a three-dimensional problem? I would be pleased if someone could help me because I also did some research by myself and I could not find anything. :)

Here for example it should locate two clusters, the one single point and the data row as the second cluster:

Here for example the second example, here I'm expectation the algorithm can find automatically three clusters, the long line, the short line and the single point:

Thanks. :)

CodePudding user response：

As @ForceBru said in the comment you can use the k-means algorithm also for 3D data. I always use the sklearn.cluster.KMeans

The key part of the example provided in the link above is the following:

from sklearn.cluster import KMeans
from sklearn import datasets

np.random.seed(5)

iris = datasets.load_iris()
X = iris.data
y = iris.target

estimators = [
    ("k_means_iris_8", KMeans(n_clusters=8)),
    ("k_means_iris_3", KMeans(n_clusters=3)),
    ("k_means_iris_bad_init", KMeans(n_clusters=3, n_init=1, init="random")),
]

You can also try to use the DBSCAN algorithm (but I am not an expert with it). Take a look

Try to study the DBSCAN parameters from the documentation and then adjust them to meet your goals.

Finally, here is a tons of other clustering algorithms, take a look at it!

Hope it helps!