Home > Mobile >  How to extract part of string in Bash using regex
How to extract part of string in Bash using regex

Time:10-07

I have been trying to extract part of string in bash. I'm using it on Mac.

Pattern of input string:

  • Some random word follow by a /. This is optional.
  • Keyword (def, foo, and bar) followed by hyphen(-) followed by numbers. This can be 2-6 digit numbers
  • These numbers are followed by hyphens again and few hyphen separated words.

Sample inputs and outputs:

abc/def-1234-random-words // def-1234
bla/foo-12-random-words // foo-12
bar-12345-random-words // bar-12345

So I tried following command to fetch it but for some weird reason, it returns entire string.

extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-[^-]*\).*/\1/g'`
// and
extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-\d{2,6}\).*/\1/g'`

I also tried to make it case-insensitive using I flag but it threw error for me:

: bad flag in substitute command: 'I'


Following are the references I tried:

CodePudding user response:

You can use the -E option to use extended regular expressions, then you don't have to escape ( and |.

echo abc/def-1234-random-words  | sed -E -e 's/.*((def|bar|foo)-[^-]*).*/\1/g'
def-1234

CodePudding user response:

This gnu sed should work with ignore case flag:

sed -E 's~^(.*/){0,1}((def|foo|bar)-[0-9]{2,6})-.*~\2~I' file

def-1234
foo-12
bar-12345

This sed matches:

  • (.*/){0,1}: Match a string upto / optionally at the start
  • (: Start capture group #2
    • (def|foo|bar): Match def or foo or bar
    • -: Match a -
    • [0-9]{2,6}: Match 2 to 6 digits
  • ): End capture group #2
  • -.*: Match - followed by anything till end
  • Substitution is value we capture in group #2

Or you may use this awk:

awk -v IGNORECASE=1 -F / 'match($NF, /^(def|foo|bar)-[0-9]{2,6}-/) {print substr($NF, 1, RLENGTH-1)}' file

def-1234
foo-12
bar-12345

Awk explanation:

  • -v IGNORECASE=1: Enable ignore case matching
  • -F /: Use / as field separator
  • match($NF, /^(def|foo|bar)-[0-9]{2,6}-/): Match text using regex ^(def|foo|bar)-[0-9]{2,6}- in $NF which is last field using / as field separator (to ignore text before /)
  • If match is successful then using substr print text from position 1 to RLENGTH-1 (since we matching until - after digits)
  • Related