I am very new to Regex and trying to create filter rule to get some matches. For Instance, I have query result like this:
application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total
Now I want to filter ONLY lines which contains "outbound" AND "service_plus" AND "failure".
I tried to play with groups, but how can I create an regex, but somwhere I am misundersteanding this which contains in wrong results.
Regex which I used:
/(?:outbound)|(?:service_plus)|(?:failure)/
CodePudding user response:
You should use multiple lookahead assertions:
^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?
The above should use the MULTILINE flag so that ^
is interpreted as start of string or start of line.
^
- matches start of string or start of line.(?=.*outbound)
- asserts that at the current position we can match 0 or more non-newline characters followed by 'outbound` without consuming any characters (i.e. the scan position is not advanced)(?=.*service_plus)
- asserts that at the current position we can match 0 or more non-newline characters followed by 'service_plus` without consuming any characters (i.e. the scan position is not advanced)(?=.*failure)
- asserts that at the current position we can match 0 or more non-newline characters followed by 'failure` without consuming any characters (i.e. the scan position is not advanced).*\n?
- matches 0 or more non-line characters optionally followed by a newline (in case the final line does not terminate in a newline character)
In Python, for example:
import re
lines = """application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total
failureoutboundservice_plus"""
rex = re.compile(r'^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?', re.M)
filtered_lines = ''.join(rex.findall(lines))
print(filtered_lines)
Prints:
application_outbound_api_external_metrics_service_plus_failure_total
failureoutboundservice_plus
CodePudding user response:
You need to make use of lookaheads to assert that multiple things need to exist regardless of the order they exist:
^(?=.*(?:^|_)outbound(?:_|$))(?=.*(?:^|_)service_plus(?:_|$))(?=.*(?:^|_)failure(?:_|$)). $
https://regex101.com/r/Zhl4Mf/1
If the order does matter then build a regex in the correct order:
^.*_outbound_.*_service_plus_failure_.*$