Home > Enterprise >  How to remove duplicate chars in a string?
How to remove duplicate chars in a string?

Time:12-15

I've got this problem and I simply can't get it right. I have to remove duplicated chars from a string.

phrase = "oo rarato roeroeu aa rouroupa dodo rerei dde romroma"

The output should be: "O rato roeu a roupa do rei de roma"

I tried things like:

def remove_duplicates(value):
    var=""
    for i in value:
        if i in value:
            if i in var:
                pass
            else:
                var=var i
    return var

print(remove_duplicates(entrada))

But it's not there yet...

Any pointers to guide me here?

CodePudding user response:

It seems from your example that you want to remove REPEATED SEQUENCES of characters, not duplicate chars across the whole string. So this is what I'm solving here.

You can use a regular expression.. not sure how horribly inefficient it is but it works.

>>> import re
>>> phrase = str("oo rarato roeroeu aa rouroupa dodo rerei dde romroma")
>>> re.sub(r'(. ?)\1 ', r'\1', phrase)
'o rato roeu a roupa do rei de roma'

How this substitution proceeds down the string:

oo -> o
" " -> " "
rara -> ra
to -> to
" "-> " "
roeroe -> roe

etc..

Edit: Works for the other example string which should not be modified:

>>> phrase = str("Barbara Bebe com Bernardo")
>>> re.sub(r'(. ?)\1 ', r'\1', phrase)
'Barbara Bebe com Bernardo'

CodePudding user response:

What you can do is form a set out of the string and then sort the remaining letters according to their original order.

def remove_duplicates(word):
    unique_letters = set(word)
    sorted_letters = sorted(unique_letters, key=word.index) # this will give you a list
    return ''.join(sorted_letters)

words = phrase.split(' ')
new_phrase = ' '.join(remove_duplicates(word) for word in words)

CodePudding user response:

String in python is a list of chars, right? But lists can have duplicates... sets cannot. So, if we convert list to set, then back to list, we'll get a list without duplicates ;P

I've seen a suggestion to use regex for replacing patterns. This will work, but that'll be a slow, and overcomplicated solution (human unfriendly to read also). Regex is a heavy and costly weapon.

Also, you do not remove duplicated from string provided, but from words in the string:

  1. First, split your string into lists of words.
  2. for each of the words, remove duplicate letters
  3. put back words to string

`

phrase = "oo rarato roeroeu aa rouroupa dodo rerei dde romroma"    

words = phrase.split(' ')

`

words ['oo', 'rarato', 'roeroeu', 'aa', 'rouroupa', 'dodo', 'rerei', 'dde', 'romroma']

words_without_duplicates = []
    for word in words:
        word = ''.join(letter for letter in list(set(word)))
        words_without_duplicates.append(word_without_duplicates)
phrase = ' '.join(word in words_without_duplicates)
    

phrase 'o oatr oeur a auopr od eir ed oamr'

Of curse, that can be optimized, but you wanted to be guided, so this is better to show the idea. It will be faster than regex too.

  • Related