Home > database >  How to get the indices of reversed duplicates in a list
How to get the indices of reversed duplicates in a list

Time:02-18

I have the following list:

list1 =[('a','b'),('c','d'),('e','f'),('g','h'),('b','a'),('e','d'),('e','g'),('h','g')]

I wish to append the indices of reversed duplicates from this list. For example:

('a','b') == ('b','a')
('g','h') == ('h','g')

I tried

exec =[]
for i,x in enumerate(list1):
    z=x[::-1]
    if z in list1:
        exec.append(i)
        list1.remove(z)

I got:

exec
[0, 3]

Which is correct. However, this is very inefficient when running on a 10 million elements of list. I know that I can directly remove the reversed duplicates by:

data = list({tuple(sorted(item)) for item in list1})

But I only want to identify the indices of reversed duplicates here. Is there a better way to do it? Thanks in advance.

CodePudding user response:

Turn the list into a set, to make membership testing efficient.

Don't remove from a list while you're iterating over it, so append the elements you're keeping to a new list.

execlist = []
set1 = set(list1)
newlist1 = []
for i, x in enumerate(list1):
    if x[::-1] in set1:
        execlist.append(i)
    else:
        newlist1.append(x)

CodePudding user response:

You can use tuples as a key of a dictionary, so create a dictionary where the sorted tuples are keys and the values are a list of indices where that tuple occurs in your list.

indices_dict = {}

for index, item in enumerate(list1):
    s_item = tuple(sorted(item))
    if s_item not in indices_dict:
        indices_dict[s_item] = [index]  
    else:
        indices_dict[s_item].append(index)

This gives:

{('a', 'b'): [0, 4],
 ('c', 'd'): [1],
 ('e', 'f'): [2],
 ('g', 'h'): [3, 7],
 ('d', 'e'): [5],
 ('e', 'g'): [6]}

Now, we need to find the elements that occur more than once, so:

repeated_indices = []
for s_item, indices in indices_dict.items():
    if len(indices) > 1:
        repeated_indices.append(indices[0])

Which gives us [0, 3]

  • Related