Home > Mobile >  Regex matching multiple groups
Regex matching multiple groups

Time:08-30

I am very new to Regex and trying to create filter rule to get some matches. For Instance, I have query result like this:

application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total

Now I want to filter ONLY lines which contains "outbound" AND "service_plus" AND "failure".

I tried to play with groups, but how can I create an regex, but somwhere I am misundersteanding this which contains in wrong results.

Regex which I used:

/(?:outbound)|(?:service_plus)|(?:failure)/

CodePudding user response:

You should use multiple lookahead assertions:

^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?

The above should use the MULTILINE flag so that ^ is interpreted as start of string or start of line.

  1. ^ - matches start of string or start of line.
  2. (?=.*outbound) - asserts that at the current position we can match 0 or more non-newline characters followed by 'outbound` without consuming any characters (i.e. the scan position is not advanced)
  3. (?=.*service_plus) - asserts that at the current position we can match 0 or more non-newline characters followed by 'service_plus` without consuming any characters (i.e. the scan position is not advanced)
  4. (?=.*failure) - asserts that at the current position we can match 0 or more non-newline characters followed by 'failure` without consuming any characters (i.e. the scan position is not advanced)
  5. .*\n? - matches 0 or more non-line characters optionally followed by a newline (in case the final line does not terminate in a newline character)

See RegEx Demo

In Python, for example:

import re

lines = """application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total
failureoutboundservice_plus"""

rex = re.compile(r'^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?', re.M)

filtered_lines = ''.join(rex.findall(lines))
print(filtered_lines)

Prints:

application_outbound_api_external_metrics_service_plus_failure_total
failureoutboundservice_plus

CodePudding user response:

You need to make use of lookaheads to assert that multiple things need to exist regardless of the order they exist:

^(?=.*(?:^|_)outbound(?:_|$))(?=.*(?:^|_)service_plus(?:_|$))(?=.*(?:^|_)failure(?:_|$)). $

https://regex101.com/r/Zhl4Mf/1


If the order does matter then build a regex in the correct order:

^.*_outbound_.*_service_plus_failure_.*$

https://regex101.com/r/b7O5YK/1

  • Related