Home > Net >  Regex match zero or one group
Regex match zero or one group

Time:09-22

I have filenames in format <pod-name>_<namespace-name>_<container-name>-<dockerid>.log

For example:

pod-name_namespace-name_container-name-7a1d0ed5675bdb365228d43f470fcee20af5c8bea84dd6d886b9bf837a9d358c.log
pod-name_namespace-name-1234567890_container-name-7a1d0ed5675bdb365228d43f470fcee20af5c8bea84dd6d886b9bf837a9d358c.log

Actually this is the k8s container's log files.

The namespace-name may contain numeric postfix that represents automation system run id (github.run_id - 10 digits number).

I need to parse filenames with regex to extract pod name, namespace name without run id, run id, container name and docker id.

Regex based on default fluentbit kubernetes parser that I need to change for our usage:

(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_] )(-(?<run_id>\d{10,}))_(?<container_name>. )-(?<docker_id>[a-z0-9]{64})\.log$

https://rubular.com/r/CROBxpHHgX5UZx

The regex above parses well filenames that contains namespace with run id, but fails to parse namespace without run id:

pod-name_namespace-name_container-name-7a1d0ed5675bdb365228d43f470fcee20af5c8bea84dd6d886b9bf837a9d358c.log

https://rubular.com/r/6MSQsnuGzrkVJG

In this case the run_id should be empty string

How to fix it that it match both cases?

CodePudding user response:

You can use

(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_] ?)(-(?<run_id>\d{10,}))?_(?<container_name>. )-(?<docker_id>[a-z0-9]{64})\.log$

See the regex demo.

The main point is to make two changes in (?<namespace_name>[^_] )(-(?<run_id>\d{10,})) part:

  • make the [^_] pattern lazy, so that it could match as few chars other than _ as possibe, i.e. add a ? after
  • make the (-(?<run_id>\d{10,})) part optional by adding a ? quantifier after the group.
  • Related