My string contains AND
, OR
and NOT
keywords, each of them is always upper case and pre- and suffixxed with a space.
This is my test-string:
X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F
I would like to get:
- all blocks connected with
AND
and separated by eitherOR
,NOT
or the beginning/end of the string. For my example i am looking forZ Z AND ZY AND ZZ
as well asB AND C
. This is what i came up with, which returnsZ AND ZY AND ZZ
instead ofZ Z AND ZY AND ZZ
because of the\w
, but i can not up with any better idea:
import re
input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
and_pairs = re.findall("\w AND . ?(?= OR | NOT )", input_string )
- also i would need all terms preceeded by a
NOT
, as well as all terms followed by anOR
in separate lists.
I dont want to seem lazy, but regex is driving me crazy (unintended rhyme).
CodePudding user response:
I think this should do the trick,
result:
>>> t_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
>>> [item.strip() for sublist in [x.split('NOT') for x in t_string.split('OR')] for item in sublist if 'AND' in item]
['Z Z AND ZY AND ZZ', 'B AND C']
CodePudding user response:
try with split
input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
split_pairs = re.split("( OR | NOT )", input_string)
and_pairs = []
for and_block in split_pairs:
if "AND" in and_block:
and_pairs = [and_block]
print(and_pairs)
CodePudding user response:
Here's how to find the AND
pairs:
import re
input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
matchRegex = r"(.*?)(?:(?: OR | NOT )(\w )) ?"
regexdata = re.findall(matchRegex, input_string)
regexdata = list(sum(regexdata,())) # flatten matches
print(regexdata)
matches = [""]
for idx, data in enumerate(regexdata): # combine separated matches
if idx % 2 == 0: matches[-1] = data
else: matches.append(data)
print(matches)
matches = list(filter(lambda match: "AND" in match, matches)) # 'and' pairs only
print(matches)
Output:
['X', 'Y', '', 'Z', ' Z AND ZY AND ZZ', 'A', '', 'B', ' AND C', 'E', '', 'F']
['X', 'Y', 'Z Z AND ZY AND ZZ', 'A', 'B AND C', 'E', 'F']
['Z Z AND ZY AND ZZ', 'B AND C']
What this does is first it matches with the regex, then it combines the separated regex groups (index 1 and 2 should be combined, 3 and 4, and so on). Once that is complete, it filter out and outputs only the AND
connected parts. If you don't need that last part you can remove it.