Home > Software engineering >  How do i remove items, that don't only consist of certain characters, from a list?
How do i remove items, that don't only consist of certain characters, from a list?

Time:05-12

I need to remove the items that contain other characters than "-" and "." from a random list.

For example:

I have this list:

['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-']

An item in the list can only consist of "-" and "." , so the output needs to be :

['.-', '-...', '-.-.', '-..', '.', '.---', '-.-']

If we take another random list:

[".-","-...","-.-.","-..",".","..-.    teveel kolommen",".---"]

Then, this output needs to be:

[".-","-...","-.-.","-..",".",".---"]

Can someone please explain to me how I can do this without using a function?

CodePudding user response:

Use set operations:

>>> [s for s in lst if set(s).issubset(set(".-"))]

Examples:

lst = ['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-']
>>> [s for s in lst if set(s).issubset(set(".-"))]
['.-', '-...', '-.-.', '-..', '.', '.---', '-.-']

lst = [".-","-...","-.-.","-..",".","..-.    teveel kolommen",".---"]
>>> [s for s in lst if set(s).issubset(set(".-"))]
['.-', '-...', '-.-.', '-..', '.', '.---']

CodePudding user response:

Use re.search with a regular expression:

import re
new_lst = [s for s in lst if re.search(r'^[-.]*$', s)]

Here, ^ is the start of the string, $ is the end of the string, [-.] is a character class that consists of 2 characters (dash and period), and * is the multiplier that says: repeat the previous item 0 or more times.

CodePudding user response:

Benchmarks of more versions:

['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-']
2.02 us  2.02 us  2.03 us  filterfalse__re_search
2.03 us  2.05 us  2.05 us  filter__issuperset
2.52 us  2.52 us  2.54 us  filter__re_match
2.53 us  2.54 us  2.55 us  filter__re_fullmatch
2.68 us  2.71 us  2.72 us  filter__re_search
4.01 us  4.10 us  4.12 us  listcomp__issubset
5.88 us  5.93 us  5.99 us  listcomp__not_re_search
6.46 us  6.48 us  6.55 us  listcomp__re_fullmatch
6.81 us  6.85 us  6.93 us  listcomp__re_search

['.-', '-...', '-.-.', '-..', '.', '..-.    teveel kolommen', '.---']
1.90 us  1.92 us  1.94 us  filterfalse__re_search
2.06 us  2.09 us  2.10 us  filter__issuperset
2.31 us  2.35 us  2.35 us  filter__re_fullmatch
2.33 us  2.34 us  2.41 us  filter__re_match
2.67 us  2.67 us  2.69 us  filter__re_search
3.88 us  3.89 us  3.92 us  listcomp__issubset
5.21 us  5.24 us  5.30 us  listcomp__not_re_search
5.80 us  5.82 us  5.83 us  listcomp__re_fullmatch
6.14 us  6.16 us  6.26 us  listcomp__re_search

Code (Try it online!):

def listcomp__issubset(lst):
    return [s for s in lst if set(s).issubset(set(".-"))]

def filter__issuperset(lst):
    return [*filter(set('.-').issuperset, lst)]

def listcomp__re_search(lst):
    return [s for s in lst if re.search(r'^[-.]*$', s)]

def listcomp__re_fullmatch(lst):
    return [s for s in lst if re.fullmatch(r'[-.]*', s)]

def listcomp__not_re_search(lst):
    return [s for s in lst if not re.search(r'[^-.]', s)]

def filter__re_search(lst):
    return [*filter(re.compile(r'^[-.]*$').search, lst)]

def filter__re_fullmatch(lst):
    return [*filter(re.compile(r'[-.]*').fullmatch, lst)]

def filter__re_match(lst):
    return [*filter(re.compile(r'[-.]*$').match, lst)]

def filterfalse__re_search(lst):
    return [*filterfalse(re.compile(r'[^-.]').search, lst)]

funcs = [
    listcomp__issubset,
    filter__issuperset,
    listcomp__re_search,
    listcomp__re_fullmatch,
    listcomp__not_re_search,
    filter__re_search,
    filter__re_fullmatch,
    filter__re_match,
    filterfalse__re_search,
]

from timeit import repeat
from random import shuffle
from bisect import insort
import re
from itertools import filterfalse

def test(lst, expect):
    print(lst)
    for func in funcs:
        result = func(lst)
        assert result == expect, func.__name__
    times = {func: [] for func in funcs}
    for _ in range(20):
        shuffle(funcs)
        for func in funcs:
            t = min(repeat(lambda: func(lst), number=1000)) / 1000
            insort(times[func], t)
    for func in sorted(funcs, key=times.get):
        print(*('%.2f us ' % (t * 1e6) for t in times[func][:3]), func.__name__)
    print()

test(['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-'],
     ['.-', '-...', '-.-.', '-..', '.', '.---', '-.-'])
test([".-","-...","-.-.","-..",".","..-.    teveel kolommen",".---"],
     [".-","-...","-.-.","-..",".",".---"])
  • Related