I am trying to parse Kubernetes Pod names for logging.
My pod names always look like this
<deployment>-<replicaset>-<uid>
<job>-<uid> <-- if created by a job
Here are some samples
events-worker-7c9b7bdc55-f7sgc
notification-585f6b94b8-t4jjc
report-generator-749ccf648d-gd9j7
static-content-8445d7f556-wbxvp
init-database-fm44h <-- if created by a job
What I am trying to get is the <deployment/job> part. For above samples this would be
events-worker
notification
report-generator
static-content
init-database
I started with something like this
(?<role_name>.*)(?:-[a-z0-9]{8,10})(?:-[a-z0-9] )
and ended with this
(?:(?<role_name>[a-z0-9] (?:-[a-z0-9] )*))-(?<=-)[a-z0-9] -(?:(?<=-)[a-z0-9] )
But I am unable to match both cases (when the name has a replicaset and if it has none)
It either does not match init-database-fm44h at all or only captures init instead of init-database.
Any help would be greatly appreciated
CodePudding user response:
You can use
\b(?<!-)(?<role_name>[a-z0-9] (?:-[a-z0-9] )*?)(?:-([a-f0-9]{10}))?-([a-z0-9] )\b(?!-)
See the regex demo.
Details:
\b(?<!-)
- a word boundary not immediately preceded with-
(?<role_name>[a-z0-9] (?:-[a-z0-9] )*?)
- Group "role_name" with ID 1: one or more letters or digits and then zero or more sequences of-
and one or more letters/digits as few times as possible(?:-([a-f0-9]{10}))?
- an optional non-capturing group matching a-
and then ten hex chars captured into Group 2-
- a hyphen([a-z0-9] )
- Group 3: one or more letters or digits\b(?!-)
- a word boundary not immediately followed with-
.