Home > Net >  In Regex, how do I match until a char or another char indefinitely but don't group the last cha
In Regex, how do I match until a char or another char indefinitely but don't group the last cha

Time:01-27

This is my string, I want my regex to return "bash" at group 1 and "585602" at group 2 (the Pid value)

Name:     bash
Umask:  0022
State:  S (sleeping)
Tgid:   585602
Ngid:   0
Pid:    585602
PPid:   585598
TracerPid:  0
Uid:    1000    1000    1000    1000
Gid:    1000    1000    1000    1000
FDSize: 256
Groups: 150 962 970 985 987 990 996 998 1000 
NStgid: 585602
NSpid:  585602
NSpgid: 585602
NSsid:  585602
VmPeak:     8708 kB
VmSize:     8708 kB
...

what I have now is

Name:\t *(.*)\n(.|\n)*?Pid:\t *(.*)\n

Unfortunately, I'm seeing that the second matched group is the single newline before the P of "Pid", and the third one is the Pid value. I sense the problem is in the (.|\n) part of the regex, but if I remove the parentheses then it groups a lot of other stuff that I don't want. How would I go about having only bash and the pid value as groups?

CodePudding user response:

You get a newline in the second group, because you are repeating (.|\n)* and repeating the value of a capture group will hold the value of the last iteration.

The character before Pid: is a newline, that is the value of the capture group that you see.

Note that using (.|\n)* is not advisable due to the alternation in the repetition. Better ways could be like (if supported) using an inline flag (?s) to have the dot match a newline, using a character class [\s\S]* or set the flag in a programming language to have the dot match a newline.

You can use 2 capture groups (you don't really need 3 groups), matching the Pid as digits and match at least a single non whitespace character \S in the first capture group.

If you want to consider the start and the end of the line, you can start the pattern with ^ and end the pattern with $

\bName:\t *(\S.*)\n[\s\S]*?^Pid:\t *(\d )\b

See a regex101 demo

Or as @anubhava suggests optionally repeating the whole line followed by a newline, non greedy like (?:.*\n)* instead of [\s\S]*?:

\bName:\t *(.*)\n(?:.*\n)*?Pid:\t *(\d )\b

See another regex101 demo.

CodePudding user response:

In perl, using slurp mode to lead the string -

$: perl -ne 'BEGIN{$/=undef} /Name:\s (\S ).*\nPid:\s (\S )/ms; print "$1 $2\n";'<<<"$str"
bash 585602
  • Related