Home > OS >  Remove groups of adjacent duplicates from list while preserving order
Remove groups of adjacent duplicates from list while preserving order

Time:03-14

There are a lot of similar question (like this one) but I did not find anything that suited my needs.

My objective is to remove groups of adjacent duplicates from a list.
For instance, if my list is

['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'C']

my desired output is

['A', 'B', 'C', 'A', 'C']

i.e. every group of adjacent duplicates is removed, only one of their group remains.


My code so far involves a for cycle with a condition:

def reduce_duplicates(l):
    
    assert len(l) > 0, "Passed list is empty."
    
    result = [l[0]]   # initialization
    
    for i in l:
        if i != result[-1]:
            result.append(i)
    
    return result


l = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'C']
print(reduce_duplicates(l))
# ['A', 'B', 'C', 'A', 'C']

It produces the expected output, but I think there is a native, optimized and elegant way to achieve the same result. Is it true?

CodePudding user response:

Use groupby from itertools:

lst = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'C']
out = [k for k, _ in groupby(lst)]
print(out)

# Output
['A', 'B', 'C', 'A', 'C']

Update

You can also use zip_longest from itertools:

lst = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'C']
out = [l for l, r in zip_longest(lst, lst[1:]) if l != r]
print(out)

# Output
['A', 'B', 'C', 'A', 'C']

Or without any imports:

lst = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'C']
out = [lst[0]]   [r for l, r in zip(lst, lst[1:]) if l != r]
print(out)

# Output
['A', 'B', 'C', 'A', 'C']

CodePudding user response:

The itertools documentation provides a recipe for exactly this, unique_justseen. Since it uses map, it may be a tiny bit faster than the regular list comprehension, and also supports a key-function.

def unique_justseen(iterable, key=None):
    "List unique elements, preserving order. Remember only the element just seen."
    # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
    # unique_justseen('ABBCcAD', str.lower) --> A B C A D
    return map(next, map(operator.itemgetter(1), groupby(iterable, key)))
  • Related