Home > database >  extract the word/ line using bash
extract the word/ line using bash

Time:08-09

I am trying to extract a word/words from a string using bash. I did try to follow https://stackoverflow.com/a/27534223/13816738 but was partially successful. i have a string looks like below s = abc-rabb-123 or s = abc-xyt-ppt-abt-004-456

What I would like is to get middle word/words such as rabb or xyt-ppt-abt-004 any ideas?

Actual code Scenario 1

s= 'extract-zskqxcrbdj-1823'
[[ "$s" =~ (-[^[:space:]^-] ) ]];
echo "${BASH_REMATCH[1]}"```

output -zskqxcrbdj

i want zskqxcrbdj

Scenario 2

s= 'abc-xyt-ppt-abt-004-456'
[[ "$s" =~ (-[^[:space:]^-] ) ]];
echo "${BASH_REMATCH[1]}"```

output -xyt

i want xyt-ppt-abt-004

CodePudding user response:

If the sole purpose is to strip off the first and last - delimited fields, one idea would be to use bash parameter expansion/substitution; this in turn eliminates the need to spawn any subprocesses (eg, for sed/cut/awk):

for s in 'abc-rabb-123' 'abc-xyt-ppt-abt-004-456' 'extract-zskqxcrbdj-1823'
do
    echo "############ $s"
    x="${s#*-}"
    x="${x%-*}"
    echo "${x}"
done

This generates:

############ abc-rabb-123
rabb
############ abc-xyt-ppt-abt-004-456
xyt-ppt-abt-004
############ extract-zskqxcrbdj-1823
zskqxcrbdj

One approach using a regex and the BASH_REMATCH[] array:

regex='^[^-]*-(.*)-[^-]*$'

for s in 'abc-rabb-123' 'abc-xyt-ppt-abt-004-456' 'extract-zskqxcrbdj-1823'
do
    echo "############ $s"
    if [[ "${s}" =~ $regex ]]
    then
        x="${BASH_REMATCH[1]}"
        echo "${x}"
    fi
done

Some comments on regex:

  • I've opted to anchor the beginning/ending of the regex with ^ and $
  • ^[^-]* - from start of string match 0 or more characters that are not a -
  • - - a literal -
  • (.*) - (1st capture group) all characters
  • - - a literal -
  • [^-]*$ - match 0 or more characters that are not -, match until the end of the string
  • if there's a match then BASH_REMATCH[1] should contain the contents of the 1st capture group
  • NOTE: add typeset -p BASH_REMATCH to see entire contents of the array)

This generates:

############ abc-rabb-123
rabb
############ abc-xyt-ppt-abt-004-456
xyt-ppt-abt-004
############ extract-zskqxcrbdj-1823
zskqxcrbdj

NOTE: OP can decide if additional checks need to be added in the case of a string that contains less than three - delimited fields

CodePudding user response:

This can be done with the sed utility:

echo "abc-xyt-ppt-abt-004-456" | sed 's/[^-]*-\(.*\)-.*/\1/'

Output:

xyt-ppt-abt-004

CodePudding user response:

echo "abc-xyt-ppt-abt-004-456" | awk -F'-' '{{for (i=2;i<NF;i  ) {d=i<NF-1?"-":"";a=a$i""d}};print a}'

CodePudding user response:

You can use the cut command:

echo abc-xyt-ppt-abt-004-456 | cut -d'-' -f2-5

Result: xyt-ppt-abt-004

echo abc-rabb-123 | cut -d'-' -f2

Result: rabb

In this cases -d is the delimiter/separator, which is -, and -f is a field, a selection or a range, you can also do something like:

echo abc-xyt-ppt-abt-004-456 | cut -d'-' -f2,3,5

Result: xyt-ppt-004

CodePudding user response:

if u just wanna strip both ends :

{m,n,g}awk   NF OFS= FS='^[^-]*-|-[^-]*$'
xyt-ppt-abt-004
  • Related