Home > database >  How do you remove all characters in a string, except for those in a list?
How do you remove all characters in a string, except for those in a list?

Time:10-07

I have a list of strings, and a list of characters I don't want, how do I remove the characters that are in the list? For example:

l = ["Bananas :)", "apple :("]
characters_i_dont_want = [":", "a"]
for i in l:
    replace_all_characters_except_for_those_in_list(characters_i_dont_want)
    print(i)

output:

Bnns )
pple (

CodePudding user response:

You could combine all unwanted characters into a regular expression:

import re

pattern = f'[{"".join(characters_i_dont_want)}]'
for i in l:
    cleaned = re.sub(pattern, '', i)
    print(cleaned)

gives

Bnns )
pple (

Just be careful when you want to remove characters that have special meaning inside character classes: - (dash), [ and ] (brackets), \ (backslash) and ^ (caret), those have to be escaped.

CodePudding user response:

you can do something like this:

l = ["Bananas :)", "apple :("]

for word in l:
    replaced = ''
    for char in word:
        # Checking if character is equal to a or :
        #if that is so do nothing
        if char == ':' or char == 'a':
            replaced  = char
        else:
            replaced  = char.replace(char, '*')
            
    print(replaced)

In here I am appending the char to the replaced variable and then replacing that appended character with * Note : .replace(toReplace, replaceWith) returns a copy of the replaced string not altering the original string itself.

CodePudding user response:

There are many ways to solve this. You already have some of the listed in the answers below. Here's one more way to do it.

In this example, I am using the regex OR (|) pipe to join all the substrings into one compiled pattern to replace.

import re

characters_i_dont_want = [":", "a"]
strings = ["Bananas :)", "apple :(", ":( Catamaran )"]

#you can use join to get all map function and join to create the replace string
p = re.compile('|'.join(map(re.escape, characters_i_dont_want))) # escape to handle metachars

#then you can just use sub to replace 
for x in [p.sub('', s) for s in strings]: print (x)

The output of this will be:

Bnns )
pple (
( Ctmrn )

CodePudding user response:

Solution using str.translate()

Here's an approach that hasn't been covered yet:

words = ["Bananas :)", "apple :("]
characters_i_dont_want = [":", "a"]
t = str.maketrans(dict.fromkeys(characters_i_dont_want, None))

for s in words:
    print(s.translate(t))

Output:

Bnns )
pple (

You can also update the translation table at any point, adding new characters, or even removing some (with .pop() or del):

t.update(dict.fromkeys(map(ord, "snp"), None))  # add 's', 'n', and 'p'
t.pop(ord("a"))  # drop 'a' from list of characters to remove

for s in words:
    print(s.translate(t))

Output:

Baaa )
ale (

Explanation

In general, to make a translation table for strings, you simply need a mapping with keys as ordinals. You can do this either by passing a dictionary of character mappings to str.maketrans(), e.g. {"a": "X"} which str.maketrans() will turn into {97: 'X'}, or you can skip str.maketrans() and map the ordinals yourself like I did in the second example with map(ord, "snp").

The dict.fromkeys() takes an iterable and a default value and constructs the dictionary for you so you don't need to bother with having to write the comprehension: {k: None for k in map(ord, "snp")} which I personally find tedious at times.

Another way to build a translation table is by using the 3-argument form of str.translate, where all three arguments must be strings, and where the characters in the first string should map to the characters in the second string (they must be the same length), and the characters in the third string will map to None. By passing two empty strings as the first two arguments, you can create a translation table purely for removing characters:

t = str.maketrans("", "", "a:")
# Bnns )
# pple (

However this method is particularly nice if you not only want to remove characters, but also translate characters:

t = str.maketrans("a", "A", ":()")  # map "a" to "A" and remove ":", "(", and ")"
# BAnAnAs
# Apple

Just note that removal happens first:

t = str.maketrans("a", "A", "a")  # map "a" to "A" and remove "a"
# Bnns :)
# pple :(

Confirming by checking what str.maketrans("a", "A", "a") returned:

>>> t
{97: None}

A note on performance

Where the str.translate() technique really shines is the fact that it's just a lookup table at the end of the day. Here's a comparison with the regex solution from @JoeFerndz, using a string of 100,000 random characters, and a list of 32 characters to remove. Spoiler, str.translate() is ~12x faster than using regex.

In [1]: import re
   ...: import random
   ...: import string

In [2]: s = "".join(random.choice(string.printable) for _ in range(100_000))
   ...: banned = string.punctuation  # !"$%&'()* ,-./:;<=>?@[\]#^_`{|}~

In [3]: p = re.compile('|'.join(map(re.escape, banned)))  # Joe's regex pattern
   ...: t = str.maketrans("", "", banned)  # my translation table

In [4]: s.translate(t) == p.sub("", s)
Out[4]: True

In [5]: %timeit s.translate(t)
364 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit p.sub("", s)
4.61 ms ± 12.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Another solution is to simply iterate over the characters to replace, which performs quite well (better than regex, even):

In [7]: def with_replace(s, banned):
   ...:     for char in banned:
   ...:         s = s.replace(char, "")
   ...:     return s
   ...:

In [8]: with_replace(s, banned) == p.sub("", s) == s.translate(t)
Out[8]: True

In [9]: %timeit with_replace(s, banned)
2.2 ms ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Now compare any of those solutions to the worst performer of them all, creating the new string character-by-character by filtering:

In [10]: def char_by_char(s, banned):
    ...:     result = ""
    ...:     for char in s:
    ...:         if char not in banned:
    ...:             result  = char
    ...:     return result
    ...:

In [11]: char_by_char(s, banned) == p.sub("", s) == s.translate(t)
Out[11]: True

In [12]: %timeit char_by_char(s, banned)
8.35 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

CodePudding user response:

l = ["Bananas :)", "apple :("]
characters_i_dont_want = [":", "a"]
for i in l:
    for ch in characters_i_dont_want:
           i=i.replace(ch,'')#replacing the character with empty space
    print(i)
 

I guess, that this might be an easier solution
Using regex can help and you can be usedit if you know about it.Else,using a string replace as shown above is more than sufficient.
see more about it here

  • Related