I am trying to extract a word/words from a string using bash. I did try to follow https://stackoverflow.com/a/27534223/13816738 but was partially successful. i have a string looks like below
s = abc-rabb-123
or s = abc-xyt-ppt-abt-004-456
What I would like is to get middle word/words such as rabb
or xyt-ppt-abt-004
any ideas?
Actual code Scenario 1
s= 'extract-zskqxcrbdj-1823'
[[ "$s" =~ (-[^[:space:]^-] ) ]];
echo "${BASH_REMATCH[1]}"```
output -zskqxcrbdj
i want zskqxcrbdj
Scenario 2
s= 'abc-xyt-ppt-abt-004-456'
[[ "$s" =~ (-[^[:space:]^-] ) ]];
echo "${BASH_REMATCH[1]}"```
output -xyt
i want xyt-ppt-abt-004
CodePudding user response:
If the sole purpose is to strip off the first and last -
delimited fields, one idea would be to use bash
parameter expansion/substitution; this in turn eliminates the need to spawn any subprocesses (eg, for sed/cut/awk
):
for s in 'abc-rabb-123' 'abc-xyt-ppt-abt-004-456' 'extract-zskqxcrbdj-1823'
do
echo "############ $s"
x="${s#*-}"
x="${x%-*}"
echo "${x}"
done
This generates:
############ abc-rabb-123
rabb
############ abc-xyt-ppt-abt-004-456
xyt-ppt-abt-004
############ extract-zskqxcrbdj-1823
zskqxcrbdj
One approach using a regex and the BASH_REMATCH[]
array:
regex='^[^-]*-(.*)-[^-]*$'
for s in 'abc-rabb-123' 'abc-xyt-ppt-abt-004-456' 'extract-zskqxcrbdj-1823'
do
echo "############ $s"
if [[ "${s}" =~ $regex ]]
then
x="${BASH_REMATCH[1]}"
echo "${x}"
fi
done
Some comments on regex
:
- I've opted to anchor the beginning/ending of the regex with
^
and$
^[^-]*
- from start of string match 0 or more characters that are not a-
-
- a literal-
(.*)
- (1st capture group) all characters-
- a literal-
[^-]*$
- match 0 or more characters that are not-
, match until the end of the string- if there's a match then
BASH_REMATCH[1]
should contain the contents of the 1st capture group - NOTE: add
typeset -p BASH_REMATCH
to see entire contents of the array)
This generates:
############ abc-rabb-123
rabb
############ abc-xyt-ppt-abt-004-456
xyt-ppt-abt-004
############ extract-zskqxcrbdj-1823
zskqxcrbdj
NOTE: OP can decide if additional checks need to be added in the case of a string that contains less than three -
delimited fields
CodePudding user response:
This can be done with the sed
utility:
echo "abc-xyt-ppt-abt-004-456" | sed 's/[^-]*-\(.*\)-.*/\1/'
Output:
xyt-ppt-abt-004
CodePudding user response:
echo "abc-xyt-ppt-abt-004-456" | awk -F'-' '{{for (i=2;i<NF;i ) {d=i<NF-1?"-":"";a=a$i""d}};print a}'
CodePudding user response:
You can use the cut
command:
echo abc-xyt-ppt-abt-004-456 | cut -d'-' -f2-5
Result: xyt-ppt-abt-004
echo abc-rabb-123 | cut -d'-' -f2
Result: rabb
In this cases -d
is the delimiter/separator, which is -
, and -f
is a field, a selection or a range, you can also do something like:
echo abc-xyt-ppt-abt-004-456 | cut -d'-' -f2,3,5
Result: xyt-ppt-004
CodePudding user response:
if u just wanna strip both ends :
{m,n,g}awk NF OFS= FS='^[^-]*-|-[^-]*$'
xyt-ppt-abt-004