Home > Software engineering >  Python regex ignore order of words in string
Python regex ignore order of words in string

Time:10-04

I want to search a substring from a log, the log looks like

log = "blablabla targets:['123-321', '123-456'] blablabla"

And here's my code snippet

node_ids = ['123-456', '123-321']
node_ids = re.escape(str(node_ids))
expected_result = f"targets:{node_ids}"

print(re.findall(expected_result, log))

Output

[]

Although I have all the ID in node_ids but the code return nothing due to the order of ID doesn't match with log. Anyway I could make regex findall ignore the order of IDs?

EDIT Match condition: all ID in node_ids are observed in log following the format of expected_result

When node_ids = ['123-321', '123-456'] I'm able to get the output

["targets:['123-321', '123-456']"]

CodePudding user response:

If the order does not matter then can use a set for the node_ids and compare the set of node id matches against the original set.

import re

log = "blablabla targets:['123-321', '123-456'] blablabla"
node_ids = {'123-456', '123-321'}
# find all node ids in the log entry
ids = re.findall(r'\b\d{3}-\d{3}\b', log)
if node_ids == set(ids):
      print("found match:", ids)

Output:

found match ['123-321', '123-456']

CodePudding user response:

Use or operator |.

log = "blablabla targets:['123-321', '123-456'] blablabla"
node_ids = ['123-456', '123-321']
pattern=fr"(targets:\[(?:\s*'(?:{'|'.join(node_ids)})',?\s*){'{' str(len(node_ids)) ',}'}\])"
result=re.findall(pattern, log)

pattern:

"(targets:\[(?:\s*'(?:123-456|123-321)',?\s*){2,}\])"

result:

["targets:['123-321', '123-456']"]

To ensure that all ID in node_ids are observed, the pattern count number of ID matches to be equal to or greater than length of the list. So we assume that node ids could not be repeated, otherwise the input targets:['123-321','123-321'] would be matched falsely.

  • Related