Home > Net >  Grep PCRE Regex Non Capturing Groups
Grep PCRE Regex Non Capturing Groups

Time:10-26

From the following text I wish to extract the following two strings:

ip-10-x-x-x.eu-west-2.compute.interna

And

topology.kubernetes.io/zone=eu-west-2a

Full blob:

ip-10-x-x-x.eu-west-2.compute.internal   Ready    <none>   18d   v1.20.4-eks-1-20-1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a

Regex with Grep PCRE is being used to extract the strings.

The following regex works on https://regex101.com/

(((^ip.*?)(?=(\s)))(?:.*?)((?<=\,)(topology\.kubernetes\.io\/zone.*?)(?=(\s|$))))

But when running on on Bash v4.2 with Grep, it pulls back to full blob, rather than the regex groups, as seen here:

echo "ip-10-x-x-x.eu-west-2.compute.internal   Ready    <none>   18d   v1.20.4-eks-1-20-1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a" | grep -oP "(((^ip.*?)(?=(\s)))(?:.*?)((?<=\,)(topology\.kubernetes\.io\/zone.*?)(?=(\s|$))))"

What am I missing here?

CodePudding user response:

As Barmer comments, grep does not refer capture groups. You need to modify the regex to work with grep:

echo "ip-10-x-x-x.eu-west-2.compute.internal   Ready    <none>   18d   v1.20.4-eks-1-20-1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a" | grep -oP "^ip\S |(?<=\,)topology\.kubernetes\.io\/zone\S*(?=(?:\s|$))"

Output:

ip-10-x-x-x.eu-west-2.compute.internal
topology.kubernetes.io/zone=eu-west-2a

If you want to make use of your regex as is, try ripgrep:

echo "ip-10-x-x-x.eu-west-2.compute.internal   Ready    <none>   18d   v1.20.4-eks-1-20-1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a" | rg --pcre2 "(((^ip.*?)(?=(\s)))(?:.*?)((?<=\,)(topology\.kubernetes\.io\/zone.*?)(?=(\s|$))))" -r '$2'$'\n''$5'

which will produce the same results.

CodePudding user response:

In case you are ok with awk, please try following awk program.

awk '
match($0,/^ip\S /){
  print substr($0,RSTART,RLENGTH)
  match($0,/,topology\.kubernetes\.io\/zone\S*/)
  print substr($0,RSTART 1,RLENGTH-1)
}
'  Input_file

Explanation: Simple explanation would be, using match function of awk to match ^ip\S then printing its matched value. Then again using 1 more match to match regex ,topology\.kubernetes\.io\/zone\S* to get the 2nd mentioned value by OP then printing only needed output by substr function.

  • Related