Home > Blockchain >  compare list of lists and create clusters
compare list of lists and create clusters

Time:03-15

I have a list that has 10,000 lists of strings of different lengths. For this question, I will make it simple and give an example of only a list that has 10 lists as follows.

list = [['a','w','r', 't'], ['e','r', 't', 't', 'r', 'd', 's'], ['a','w','r', 't'], ['n', 'g', 'd', 'e', 's'], ['a', 'b', 'c'], ['t', 'f', 'h', 'd', 'p'], ['a', 'b', 'c'], ['a','w','r', 't'], ['s','c','d'], ['e','r', 't', 't', 'r', 'd', 's']]

what I want is to compare each list with all other lists and group the similar lists into one new list (called a cluster) and also group the list indices.

Expected output:

cluster_1_lists = [['a','w','r', 't'], ['a','w','r', 't'], ['a','w','r', 't']]

cluster_1_indices = [0,2,7]

cluster_2_lists = [['e','r', 't', 't', 'r', 'd', 's'],['e','r', 't', 't', 'r', 'd', 's']]

cluster_2_indices = [1,9]

cluster_3_lists = [['n', 'g', 'd', 'e', 's']]

cluster_3_indices = [3]

cluster_4_lists = [['a', 'b', 'c'], ['a', 'b', 'c']]

cluster_4_indices = [4,6]

cluster_5_lists = [['t', 'f', 'h', 'd', 'p']]

cluster_5_indices = [5]

cluster_6_lists = [['s','c','d']]

cluster_6_indices = [8]

Can you help me to implement this in python?

CodePudding user response:

Ok so here, I'll basically be using a dictionary to make a cluster. Here's what I've done:

list= [['a','w','r', 't'], ['e','r', 't', 't', 'r', 'd', 's'], ['a','w','r', 't'], ['n', 'g', 'd', 'e', 's'], ['a', 'b', 'c'], ['t', 'f', 'h', 'd', 'p'], ['a', 'b', 'c'], ['a','w','r', 't'], ['s','c','d'], ['e','r', 't', 't', 'r', 'd', 's']]
cluster = {}

for i in list:
    cluster[''.join(i)] = []
    cluster[''.join(i) '_indices'] = []
for j in range(len(list)-1):
    for k in cluster:
        if ''.join(list[j]) == k:
            cluster[k].append(list[j])
            cluster[k '_indices'].append(j)
print(cluster)

The first for loop basically creates a key with the joint name of your list, because you cannot have a key as a list. Then, it stores it val as an empty list which will further be appended. In the second for loop, it iterates again through the list and inside it I have iterated through the keys in the cluster (dict). Then, it basically checks if the joint list is equal to the key name, if yes it appends the value. The output will look like this:

Output: {'awrt': [['a', 'w', 'r', 't'], ['a', 'w', 'r', 't'], ['a', 'w', 'r', 't']], 'awrt_indices': [0, 2, 7], 'erttrds': [['e', 'r', 't', 't', 'r', 'd', 's']], 'erttrds_indices': [1], 'ngdes': [['n', 'g', 'd', 'e', 's']], 'ngdes_indices': [3], 'abc': [['a', 'b', 'c'], ['a', 'b', 'c']], 'abc_indices': [4, 6], 'tfhdp': [['t', 'f', 'h', 'd', 'p']], 'tfhdp_indices': [5], 'scd': [['s', 'c', 'd']], 'scd_indices': [8]}

Note: Creating separate variables as you want will just make the code messy, python has a solution to it which is dictionaries and thus I've used it.

CodePudding user response:

Here is the working answer:

for i in list:
cluster[''.join(i)] = []
xx = []
xx_idx=[]
for k in cluster:
   yy = []
   yy_ixd = []
   for j in range(len(list)):
      if k == ''.join(list[j]):
         yy.append(list[j])
         yy_ixd.append(j)
   xx.append(yy)
   xx_idx.append(yy_ixd)
print("output", xx)
print("indices: ", xx_idx)

Output:

output [[['a', 'w', 'r', 't'], ['a', 'w', 'r', 't'], ['a', 'w', 'r', 't']], [['e', 'r', 't', 't', 'r', 'd', 's'], ['e', 'r', 't', 't', 'r', 'd', 's']], [['n', 'g', 'd', 'e', 's']], [['a', 'b', 'c'], ['a', 'b', 'c']], [['t', 'f', 'h', 'd', 'p']], [['s', 'c', 'd']]]

indices: [[0, 2, 7], [1, 9], [3], [4, 6], [5], [8]]

  • Related