I am following this example semantic clustering:
!pip install sentence_transformers
from sentence_transformers import SentenceTransformer
from sklearn.cluster import KMeans
embedder = SentenceTransformer('all-MiniLM-L6-v2')
# Corpus with example sentences
corpus = ['A man is eating food.',
'A man is eating a piece of bread.',
'A man is eating pasta.',
'The girl is carrying a baby.',
'The baby is carried by the woman',
'A man is riding a horse.',
'A man is riding a white horse on an enclosed ground.',
'A monkey is playing drums.',
'Someone in a gorilla costume is playing a set of drums.',
'A cheetah is running behind its prey.',
'A cheetah chases prey on across a field.'
]
corpus_embeddings = embedder.encode(corpus)
# Perform kmean clustering
num_clusters = 5
clustering_model = KMeans(n_clusters=num_clusters)
clustering_model.fit(corpus_embeddings)
cluster_assignment = clustering_model.labels_
clustered_sentences = [[] for i in range(num_clusters)]
for sentence_id, cluster_id in enumerate(cluster_assignment):
clustered_sentences[cluster_id].append(corpus[sentence_id])
for i, cluster in enumerate(clustered_sentences):
print("Cluster", i 1)
print(cluster)
print(len(cluster))
print("")
Which results to the following lists
:
Cluster 1
['The girl is carrying a baby.', 'The baby is carried by the woman']
2
Cluster 2
['A man is riding a horse.', 'A man is riding a white horse on an enclosed ground.']
2
Cluster 3
['A man is eating food.', 'A man is eating a piece of bread.', 'A man is eating pasta.']
3
Cluster 4
['A cheetah is running behind its prey.', 'A cheetah chases prey on across a field.']
2
Cluster 5
['A monkey is playing drums.', 'Someone in a gorilla costume is playing a set of drums.']
2
How to add these separate list
to one?
Expected outcome:
list2[['The girl is carrying a baby.', 'The baby is carried by the woman'], .....['A monkey is playing drums.', 'Someone in a gorilla costume is playing a set of drums.']]
I tried the following:
list2=[]
for i in cluster:
list2.append(i)
list2
But I returns me only the last one:
['A monkey is playing drums.',
'Someone in a gorilla costume is playing a set of drums.']
Any ideas?
CodePudding user response:
Following that example, you don't need to anything to get a list of lists; that's already been done for you.
Try printing clustered_sentences
.
CodePudding user response:
Basically, you need to get a "flat" list from a list of lists, you can achieve that with python list comprehension:
flat = [item for sub in clustered_sentences for item in sub]