I have been trying to extract part of string in bash. I'm using it on Mac.
Pattern of input string:
- Some random word follow by a
/
. This is optional. - Keyword (
def
,foo
, andbar
) followed by hyphen(-
) followed by numbers. This can be 2-6 digit numbers - These numbers are followed by hyphens again and few hyphen separated words.
Sample inputs and outputs:
abc/def-1234-random-words // def-1234
bla/foo-12-random-words // foo-12
bar-12345-random-words // bar-12345
So I tried following command to fetch it but for some weird reason, it returns entire string.
extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-[^-]*\).*/\1/g'`
// and
extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-\d{2,6}\).*/\1/g'`
I also tried to make it case-insensitive using I
flag but it threw error for me:
: bad flag in substitute command: 'I'
Following are the references I tried:
CodePudding user response:
You can use the -E
option to use extended regular expressions, then you don't have to escape (
and |
.
echo abc/def-1234-random-words | sed -E -e 's/.*((def|bar|foo)-[^-]*).*/\1/g'
def-1234
CodePudding user response:
This gnu sed
should work with ignore case flag:
sed -E 's~^(.*/){0,1}((def|foo|bar)-[0-9]{2,6})-.*~\2~I' file
def-1234
foo-12
bar-12345
This sed matches:
(.*/){0,1}
: Match a string upto/
optionally at the start(
: Start capture group #2(def|foo|bar)
: Matchdef
orfoo
orbar
-
: Match a-
[0-9]{2,6}
: Match 2 to 6 digits
)
: End capture group #2-.*
: Match-
followed by anything till end- Substitution is value we capture in group #2
Or you may use this awk
:
awk -v IGNORECASE=1 -F / 'match($NF, /^(def|foo|bar)-[0-9]{2,6}-/) {print substr($NF, 1, RLENGTH-1)}' file
def-1234
foo-12
bar-12345
Awk explanation:
-v IGNORECASE=1
: Enable ignore case matching-F /
: Use/
as field separatormatch($NF, /^(def|foo|bar)-[0-9]{2,6}-/)
: Match text using regex^(def|foo|bar)-[0-9]{2,6}-
in$NF
which is last field using/
as field separator (to ignore text before/
)- If match is successful then using
substr
print text from position1
toRLENGTH-1
(since we matching until-
after digits)