I want to search a substring from a log, the log looks like
log = "blablabla targets:['123-321', '123-456'] blablabla"
And here's my code snippet
node_ids = ['123-456', '123-321']
node_ids = re.escape(str(node_ids))
expected_result = f"targets:{node_ids}"
print(re.findall(expected_result, log))
Output
[]
Although I have all the ID in node_ids
but the code return nothing due to the order of ID doesn't match with log. Anyway I could make regex findall
ignore the order of IDs?
EDIT
Match condition: all ID in node_ids
are observed in log
following the format of expected_result
When node_ids = ['123-321', '123-456']
I'm able to get the output
["targets:['123-321', '123-456']"]
CodePudding user response:
If the order does not matter then can use a set for the node_ids and compare the set of node id matches against the original set.
import re
log = "blablabla targets:['123-321', '123-456'] blablabla"
node_ids = {'123-456', '123-321'}
# find all node ids in the log entry
ids = re.findall(r'\b\d{3}-\d{3}\b', log)
if node_ids == set(ids):
print("found match:", ids)
Output:
found match ['123-321', '123-456']
CodePudding user response:
Use or operator |
.
log = "blablabla targets:['123-321', '123-456'] blablabla"
node_ids = ['123-456', '123-321']
pattern=fr"(targets:\[(?:\s*'(?:{'|'.join(node_ids)})',?\s*){'{' str(len(node_ids)) ',}'}\])"
result=re.findall(pattern, log)
pattern:
"(targets:\[(?:\s*'(?:123-456|123-321)',?\s*){2,}\])"
result:
["targets:['123-321', '123-456']"]
To ensure that all ID in node_ids
are observed, the pattern count number of ID matches to be equal to or greater than length of the list. So we assume that node ids could not be repeated, otherwise the input targets:['123-321','123-321']
would be matched falsely.