I need to remove the items that contain other characters than "-" and "." from a random list.
For example:
I have this list:
['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-']
An item in the list can only consist of "-" and "." , so the output needs to be :
['.-', '-...', '-.-.', '-..', '.', '.---', '-.-']
If we take another random list:
[".-","-...","-.-.","-..",".","..-. teveel kolommen",".---"]
Then, this output needs to be:
[".-","-...","-.-.","-..",".",".---"]
Can someone please explain to me how I can do this without using a function?
CodePudding user response:
Use set
operations:
>>> [s for s in lst if set(s).issubset(set(".-"))]
Examples:
lst = ['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-']
>>> [s for s in lst if set(s).issubset(set(".-"))]
['.-', '-...', '-.-.', '-..', '.', '.---', '-.-']
lst = [".-","-...","-.-.","-..",".","..-. teveel kolommen",".---"]
>>> [s for s in lst if set(s).issubset(set(".-"))]
['.-', '-...', '-.-.', '-..', '.', '.---']
CodePudding user response:
Use re.search
with a regular expression:
import re
new_lst = [s for s in lst if re.search(r'^[-.]*$', s)]
Here, ^
is the start of the string, $
is the end of the string, [-.]
is a character class that consists of 2 characters (dash and period), and *
is the multiplier that says: repeat the previous item 0 or more times.
CodePudding user response:
Benchmarks of more versions:
['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-']
2.02 us 2.02 us 2.03 us filterfalse__re_search
2.03 us 2.05 us 2.05 us filter__issuperset
2.52 us 2.52 us 2.54 us filter__re_match
2.53 us 2.54 us 2.55 us filter__re_fullmatch
2.68 us 2.71 us 2.72 us filter__re_search
4.01 us 4.10 us 4.12 us listcomp__issubset
5.88 us 5.93 us 5.99 us listcomp__not_re_search
6.46 us 6.48 us 6.55 us listcomp__re_fullmatch
6.81 us 6.85 us 6.93 us listcomp__re_search
['.-', '-...', '-.-.', '-..', '.', '..-. teveel kolommen', '.---']
1.90 us 1.92 us 1.94 us filterfalse__re_search
2.06 us 2.09 us 2.10 us filter__issuperset
2.31 us 2.35 us 2.35 us filter__re_fullmatch
2.33 us 2.34 us 2.41 us filter__re_match
2.67 us 2.67 us 2.69 us filter__re_search
3.88 us 3.89 us 3.92 us listcomp__issubset
5.21 us 5.24 us 5.30 us listcomp__not_re_search
5.80 us 5.82 us 5.83 us listcomp__re_fullmatch
6.14 us 6.16 us 6.26 us listcomp__re_search
Code (Try it online!):
def listcomp__issubset(lst):
return [s for s in lst if set(s).issubset(set(".-"))]
def filter__issuperset(lst):
return [*filter(set('.-').issuperset, lst)]
def listcomp__re_search(lst):
return [s for s in lst if re.search(r'^[-.]*$', s)]
def listcomp__re_fullmatch(lst):
return [s for s in lst if re.fullmatch(r'[-.]*', s)]
def listcomp__not_re_search(lst):
return [s for s in lst if not re.search(r'[^-.]', s)]
def filter__re_search(lst):
return [*filter(re.compile(r'^[-.]*$').search, lst)]
def filter__re_fullmatch(lst):
return [*filter(re.compile(r'[-.]*').fullmatch, lst)]
def filter__re_match(lst):
return [*filter(re.compile(r'[-.]*$').match, lst)]
def filterfalse__re_search(lst):
return [*filterfalse(re.compile(r'[^-.]').search, lst)]
funcs = [
listcomp__issubset,
filter__issuperset,
listcomp__re_search,
listcomp__re_fullmatch,
listcomp__not_re_search,
filter__re_search,
filter__re_fullmatch,
filter__re_match,
filterfalse__re_search,
]
from timeit import repeat
from random import shuffle
from bisect import insort
import re
from itertools import filterfalse
def test(lst, expect):
print(lst)
for func in funcs:
result = func(lst)
assert result == expect, func.__name__
times = {func: [] for func in funcs}
for _ in range(20):
shuffle(funcs)
for func in funcs:
t = min(repeat(lambda: func(lst), number=1000)) / 1000
insort(times[func], t)
for func in sorted(funcs, key=times.get):
print(*('%.2f us ' % (t * 1e6) for t in times[func][:3]), func.__name__)
print()
test(['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-'],
['.-', '-...', '-.-.', '-..', '.', '.---', '-.-'])
test([".-","-...","-.-.","-..",".","..-. teveel kolommen",".---"],
[".-","-...","-.-.","-..",".",".---"])