Is there any way to group the close numbers of a list by numpy?-CodePudding

I've a list

x = [1,3,2,89,26,31,35,78,5,3,70]

What I want is something like below:

[[1,2,3,5], [26,31,35], [70,78,89]]

Is it possible to group the close integer elements out of a integer list in python?

CodePudding user response：

For a prespecified threshold thr, let's say that a group of integers is "close" if each integer is less than thr away from the next largest of the group. Under this definition, we can group the close numbers together with the following function.

import numpy as np

def group(a,thr):
    x = np.sort(a)
    diff = x[1:]-x[:-1]
    gps = np.concatenate([[0],np.cumsum(diff>=thr)])
    return [x[gps==i] for i in range(gps[-1] 1)]

For example, group([1,3,2,89,26,31,35,78,5,3,70],20) returns

[array([1, 2, 3, 3, 5]), array([26, 31, 35]), array([70, 78, 89])]

CodePudding user response：

I realize your question is specifically about numpy, but as Ben pointed out, it would be up to you to decide the threshold, which may not be easy to do.

This sounds to me like a basic kmeans exercise where you set the number of groups and let the model do the rest. In this example I've chosen 3 clusters tomatch your output, but ideally you might use something like the elbow method to pick the optimum number of clusters so that your separation between groups is the best possible.

from sklearn.cluster import KMeans
import numpy as np
from itertools import groupby

x = [1,3,2,89,26,31,35,78,5,3,70]
x = sorted(x)

kmeans = KMeans(n_clusters=3, random_state=0).fit(np.reshape(x,(-1,1)))

[[i[0] for i in list(d)] for g,d in groupby(list(zip(x,kmeans.labels_)), key=lambda x: x[1])]

Output

[[1, 2, 3, 3, 5], [26, 31, 35], [70, 78, 89]]