I have the following list:
list1 =[('a','b'),('c','d'),('e','f'),('g','h'),('b','a'),('e','d'),('e','g'),('h','g')]
I wish to append the indices of reversed duplicates from this list. For example:
('a','b') == ('b','a')
('g','h') == ('h','g')
I tried
exec =[]
for i,x in enumerate(list1):
z=x[::-1]
if z in list1:
exec.append(i)
list1.remove(z)
I got:
exec
[0, 3]
Which is correct. However, this is very inefficient when running on a 10 million elements of list. I know that I can directly remove the reversed duplicates by:
data = list({tuple(sorted(item)) for item in list1})
But I only want to identify the indices of reversed duplicates here. Is there a better way to do it? Thanks in advance.
CodePudding user response:
Turn the list into a set, to make membership testing efficient.
Don't remove from a list while you're iterating over it, so append the elements you're keeping to a new list.
execlist = []
set1 = set(list1)
newlist1 = []
for i, x in enumerate(list1):
if x[::-1] in set1:
execlist.append(i)
else:
newlist1.append(x)
CodePudding user response:
You can use tuples as a key of a dictionary, so create a dictionary where the sorted tuples are keys and the values are a list of indices where that tuple occurs in your list.
indices_dict = {}
for index, item in enumerate(list1):
s_item = tuple(sorted(item))
if s_item not in indices_dict:
indices_dict[s_item] = [index]
else:
indices_dict[s_item].append(index)
This gives:
{('a', 'b'): [0, 4],
('c', 'd'): [1],
('e', 'f'): [2],
('g', 'h'): [3, 7],
('d', 'e'): [5],
('e', 'g'): [6]}
Now, we need to find the elements that occur more than once, so:
repeated_indices = []
for s_item, indices in indices_dict.items():
if len(indices) > 1:
repeated_indices.append(indices[0])
Which gives us [0, 3]