Home > Software design >  Regular expression to make non-greedy
Regular expression to make non-greedy

Time:11-28

I have a text like this

EXPRESS      blood| muscle| testis| normal| tumor| fetus| adult
RESTR_EXPR   soft tissue/muscle tissue tumor

Right now I want to only extract the last item in EXPRESS line, which is adult.

My pattern is:

[|](.*?)\n

The code goes greedy to muscle| testis| normal| tumor| fetus| adult. Can I know if there is any way to solve this issue?

CodePudding user response:

You can take the capture group value exclude matching pipe chars after matching a pipe char followed by optional spaces.

If there has to be a newline at the end of the string:

\|[^\S\n]*([^|\n]*)\n

Explanation

  • \| Match |
  • [^\S\n]* Match optional whitespace chars without newlines
  • ( Capture group 1
    • [^|\n]* Match optional chars except for | or a newline
  • ) Close group 1
  • \n Match a newline

Regex demo

Or asserting the end of the string:

\|[^\S\n]*([^|\n]*)$

CodePudding user response:

You could use this one. It spares you the space before, handle the \r\n case and is non-greedy:

\|\s*([^\|])*?\r?\n

Tested here

  • Related