Home > OS >  regex non greedy quantifier catching nothing, greedy catching too much
regex non greedy quantifier catching nothing, greedy catching too much

Time:09-29

I'm writing a python regex formula that parses the content of a heading, however the greedy quantifier is not working well, and the non greedy quantifier is not working at all.

My string is

Step 1 Introduce The Assets:
Step2 Verifying the Assets
Step 3Making sure all the data is in the right place:

What I'm trying to do is extract the step number, and the heading, excluding the :. Now I've tried multiple regex string and came up with these 2:

r1 = r"Step ?([0-9] ) ?(.*) ?:?"
r2 = r"Step ?([0-9] ) ?(.*?) ?:?"

r1 is capturing the step number, but is also capturing : at the end. r2 is capturing the step number, and ''. I'm not sure how to handle the case where there is a .* followed by a string.

Necessary Edit: The heading might contain : inside the string, I just want to ignore the trailing one. I know I can strip(':') but I want to understand what I'm doing wrong.

CodePudding user response:

You can write the pattern using a negated character class without the non greedy and optional parts using a negated character class:

\bStep ?(\d ) ?([^:\n] )
  • \bStep ? Match the word Step and optional space
  • (\d ) ? Capture 1 digits in group 1 followed by matching an optional space
  • ([^:\n] ) Capture 1 chars other than : or a newline in group 2

Regex demo

If the colon has to be at the end of the string:

\bStep ?(\d ) ?([^:\n] ):?$

Regex demo

  • Related