This is my string, I want my regex to return "bash" at group 1 and "585602" at group 2 (the Pid value)
Name: bash
Umask: 0022
State: S (sleeping)
Tgid: 585602
Ngid: 0
Pid: 585602
PPid: 585598
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 256
Groups: 150 962 970 985 987 990 996 998 1000
NStgid: 585602
NSpid: 585602
NSpgid: 585602
NSsid: 585602
VmPeak: 8708 kB
VmSize: 8708 kB
...
what I have now is
Name:\t *(.*)\n(.|\n)*?Pid:\t *(.*)\n
Unfortunately, I'm seeing that the second matched group is the single newline before the P of "Pid", and the third one is the Pid value. I sense the problem is in the (.|\n)
part of the regex, but if I remove the parentheses then it groups a lot of other stuff that I don't want. How would I go about having only bash and the pid value as groups?
CodePudding user response:
You get a newline in the second group, because you are repeating (.|\n)*
and repeating the value of a capture group will hold the value of the last iteration.
The character before Pid:
is a newline, that is the value of the capture group that you see.
Note that using (.|\n)*
is not advisable due to the alternation in the repetition. Better ways could be like (if supported) using an inline flag (?s)
to have the dot match a newline, using a character class [\s\S]*
or set the flag in a programming language to have the dot match a newline.
You can use 2 capture groups (you don't really need 3 groups), matching the Pid
as digits and match at least a single non whitespace character \S
in the first capture group.
If you want to consider the start and the end of the line, you can start the pattern with ^
and end the pattern with $
\bName:\t *(\S.*)\n[\s\S]*?^Pid:\t *(\d )\b
See a regex101 demo
Or as @anubhava suggests optionally repeating the whole line followed by a newline, non greedy like (?:.*\n)*
instead of [\s\S]*?
:
\bName:\t *(.*)\n(?:.*\n)*?Pid:\t *(\d )\b
See another regex101 demo.
CodePudding user response:
In perl
, using slurp mode to lead the string -
$: perl -ne 'BEGIN{$/=undef} /Name:\s (\S ).*\nPid:\s (\S )/ms; print "$1 $2\n";'<<<"$str"
bash 585602