A search on SO with just [regex]
gave me 249'446 hits and a search with [regex] inclusion exclusion
gave me 47 hits but I guess none of the latter (maybe some of the former?) fit my case.
I am also aware, e.g. about this regex page https://www.regular-expressions.info/refquick.html, but I guess there might be a regex concept which I am not yet familiar with and would be grateful for hints.
Here is a minimal example of what I am trying to do with a given list of strings.
Find all items which:
- have a fixed defined number of characters, i.e. length
- must include all characters from a certain list (doesn't matter at what position and if multiple times)
- must NOT include any characters from a certain list
Constructs like: [ei^no]{4}
, ((?![no])[ei]){4}
and a lot of other more complex trials didn't give the desired results.
Hence, I currently implemented this as a 3 step process with checking the length, doing a search and a match. This looks pretty cumbersome and inefficient to me.
Is there a more efficient way to do this?
Script:
import re
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve']
count = 4
mustContain = 'ei' # all of these charactes at least once
mustNotContain = 'no' # none of those chars
hits1 = []
for item in items:
if len(item)==count:
hits1.append(item)
print("Hits1:",hits1)
hits2 = []
for hit in hits1:
regex = '[{}]'.format(mustContain)
if re.search(regex,hit):
hits2.append(hit)
print("Hits2:", hits2)
hits3 = []
for hit in hits2:
regex = '[{}]'.format(mustNotContain)
if re.match(regex,hit):
hits3.append(hit)
print("Hits3:", hits3)
Result:
Hits1: ['four', 'five', 'nine']
Hits2: ['five', 'nine']
Hits3: ['five']
CodePudding user response:
If you are interested in a regex approach, you can create a single dynamic pattern that looks like:
^(?=.{4}$)(?![^no\n]*[no])(?=[^e\n]*e)[^i\n]*i.*$
Explanation
^
Start of string(?=.{4}$)
Assert 4 characters(?![^no\n]*[no])
Assert no occurrence ofn
oro
to the right using a leading negated character class(?=[^e\n]*e)
Assert ane
char to the right[^i\n]*i
Match any char excepti
and then matchi
.*
Match the rest of the line$
end of string
See a regex demo and a Python demo.
Example
import re
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'tree']
hits = [item for item in items if re.match(r"(?=.{4}$)(?![^no\n]*[no])(?=[^e\n]*e)[^i\n]*i.*$", item)]
print(hits)
Output
['five']
Using a variation of all
and a list comprehension:
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'tree']
count = 4
mustContain = ["e", "i"] # all of these characters at least once
mustNotContain = ["n", "o"] # none of those chars
hits = [
item for item in items if
len(item) == count and
all([c in item for c in mustContain]) and
all([c not in item for c in mustNotContain])
]
print(hits)
Output
['five']
See a Python demo.