From the following text I wish to extract the following two strings:
ip-10-x-x-x.eu-west-2.compute.interna
And
topology.kubernetes.io/zone=eu-west-2a
Full blob:
ip-10-x-x-x.eu-west-2.compute.internal Ready <none> 18d v1.20.4-eks-1-20-1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a
Regex with Grep PCRE is being used to extract the strings.
The following regex works on https://regex101.com/
(((^ip.*?)(?=(\s)))(?:.*?)((?<=\,)(topology\.kubernetes\.io\/zone.*?)(?=(\s|$))))
But when running on on Bash v4.2 with Grep, it pulls back to full blob, rather than the regex groups, as seen here:
echo "ip-10-x-x-x.eu-west-2.compute.internal Ready <none> 18d v1.20.4-eks-1-20-1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a" | grep -oP "(((^ip.*?)(?=(\s)))(?:.*?)((?<=\,)(topology\.kubernetes\.io\/zone.*?)(?=(\s|$))))"
What am I missing here?
CodePudding user response:
As Barmer comments, grep
does not refer capture groups. You need to modify the regex to work with grep:
echo "ip-10-x-x-x.eu-west-2.compute.internal Ready <none> 18d v1.20.4-eks-1-20-1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a" | grep -oP "^ip\S |(?<=\,)topology\.kubernetes\.io\/zone\S*(?=(?:\s|$))"
Output:
ip-10-x-x-x.eu-west-2.compute.internal
topology.kubernetes.io/zone=eu-west-2a
If you want to make use of your regex as is, try ripgrep
:
echo "ip-10-x-x-x.eu-west-2.compute.internal Ready <none> 18d v1.20.4-eks-1-20-1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a" | rg --pcre2 "(((^ip.*?)(?=(\s)))(?:.*?)((?<=\,)(topology\.kubernetes\.io\/zone.*?)(?=(\s|$))))" -r '$2'$'\n''$5'
which will produce the same results.
CodePudding user response:
In case you are ok with awk
, please try following awk
program.
awk '
match($0,/^ip\S /){
print substr($0,RSTART,RLENGTH)
match($0,/,topology\.kubernetes\.io\/zone\S*/)
print substr($0,RSTART 1,RLENGTH-1)
}
' Input_file
Explanation: Simple explanation would be, using match
function of awk
to match ^ip\S
then printing its matched value. Then again using 1 more match
to match regex ,topology\.kubernetes\.io\/zone\S*
to get the 2nd mentioned value by OP then printing only needed output by substr function.