Is there a way of telling a regular expression (specifically sed
) to prefer using an optional component when the input also matches without using that component?
I'm trying to extract a number from a string that may optionally be preceded by prefix. It works in the following cases:
echo dummy/123456/dummy | sed "s:.*/\(prefix\)\?\([0-9]\{3,\}\)/.*:\2:"
123456
echo dummy/prefix123456/dummy | sed "s:.*/\(prefix\)\?\([0-9]\{3,\}\)/.*:\2:"
123456
but if the string contains both a prefixed number and a "bare" number, it choses the bare number:
echo dummy/prefix123456/987654/dummy | sed "s:.*/\(prefix\)\?\([0-9]\{3,\}\)/.*:\2:"
987654
Is there a way of forcing sed
to prefer the match including the prefix (123456
)? All search results I've found talk of greedy/lazy options, which – as far as I can tell – don't apply here.
Clarifications
The
dummy
portions in the examples above may contain slashes.The bit I'm interested in is either the first slash-delimited run of three or more digits (
.../123456/...
) or the first slash-delimited run of 3 digits with a prefix (.../prefix123456/...
), whichever occurs first.
CodePudding user response:
You may try this sed
command:
sed '
/.*\/prefix\([0-9]\{3,\}\)\/.*/{
s//\1/
b
}
s/.*\/\([0-9]\{3,\}\)\/.*/\1/
' file
which will print out
123456
123456
123456
123456
where the content of file
is
dummy/123456/dummy
dummy/prefix123456/dummy
dummy/prefix123456/987654/dummy
dummy/987654/prefix123456/dummy
CodePudding user response:
sed
BRE or ERE doesn't have a way to use lazy quantifier in starting .*?
.
However, based on your use-cases, you may use this sed
:
sed -E 's~[^/]*/(prefix){0,1}([0-9]{3,})/.*~\2~' file
123456
123456
123456
where input is:
cat file
dummy/123456/dummy
dummy/prefix123456/dummy
dummy/prefix123456/987654/dummy
Here we are using negated character class (bracket expression) [^/]*
instead of .*
to allow pattern to match 0 or more of any char that is not a /
.
If you can consider perl
then .*?
with a negative lookahead will work for you:
perl -pe 's~^.*?/(?:prefix)?(\d{3,})(?!.*prefix\d{3}).*~$1~' file
CodePudding user response:
With GNU awk
you could try following code. Written and tested with shown samples only.
awk 'match($0,/\/(prefix){0,1}([0-9] )/,arr){print arr[2]}' Input_file
Explanation: Simple explanation would be, using GNU awk
's match
function. In it using regex (prefix){0,1}([0-9] )
which is having 2 capturing groups and its matched values are getting stored into array named arr
and if condition is fine then printing 2nd element of that array.
CodePudding user response:
Using sed
$ sed -E 's/[^0-9]*(prefix)?([0-9]{3,}).*/\2/' input_file
123456