I have a string for example 'home/JOHNSMITH-4991-common-task-list' and I want to take out the uppercase part and the numbers with the hyphen between them. I echo the string and pipe it to sed like so but I keep getting all the hyphens I don't want e.g.
echo home/JOHNSMITH-4991-common-task-list | sed 's/[^A-Z0-9-]//g'
gives me:
JOHNSMITH-4991---
I need:
JOHNSMITH-4991
How do I ignore all but the first hyphen?
Thanks.
CodePudding user response:
You can use
sed 's,.*/\([^-]*-[^-]*\).*,\1,'
POSIX BRE regex details:
.*
- any zero or more chars/
- a/
char\([^-]*-[^-]*\)
- Group 1: any zero or more chars other than-
, a hyphen, and then again zero or more chars other than-
.*
- any zero or more chars
The replacement is the Group 1 placeholder, \1
, to restore just the text captured.
See the online demo:
#!/bin/bash
s="home/JOHNSMITH-4991-common-task-list"
sed 's,.*/\([^-]*-[^-]*\).*,\1,' <<< "$s"
# => JOHNSMITH-4991
CodePudding user response:
1st solution: With awk
it will be much easier and we could keep it simple, please try following, written and tested with your shown samples.
echo "echo home/JOHNSMITH-4991-common-task-list" | awk -F'/|-' '{print $2"-"$3}'
Explanation: Simple explanation would be, setting field separator as /
OR -
and printing 2nd field -
and 3rd field of current line.
2nd solution: Using match
function of awk
program here.
echo "echo home/JOHNSMITH-4991-common-task-list" |
awk '
match($0,/\/[^-]*-[^-]*/){
print substr($0,RSTART 1,RLENGTH-1)
}'
3rd solution: Using GNU grep
solution here. Using -oP
option of grep
here, to print matched values with o option and to enable ERE(extended regular expression) with P
option. Then in main program of grep
using .*/
followed by \K
to ignore previous matched part and then mentioning [^-]*-[^-]*
to make sure to get values just before 2nd occurrence of -
in matched line.
echo "echo home/JOHNSMITH-4991-common-task-list" | grep -oP '.*/\K[^-]*-[^-]*'
CodePudding user response:
Here is a simple alternative solution using cut
with bash string substitution:
s='home/JOHNSMITH-4991-common-task-list'
cut -d- -f1-2 <<< "${s##*/}"
JOHNSMITH-4991
CodePudding user response:
You could match until the first occurrence of the /
, then clear the match buffer with \K
and then repeat the character class 1 times with a hyphen in between to select at least characters before and after the hyphen.
[^/]*/\K[A-Z0-9] -[A-Z0-9]
If supported, using gnu grep:
echo "echo home/JOHNSMITH-4991-common-task-list" | grep -oP '[^/]*/\K[A-Z0-9] -[A-Z0-9] '
Output
JOHNSMITH-4991
If gnu awk
is an option, using the same pattern but with a capture group:
echo "home/JOHNSMITH-4991-common-task-list" | awk 'match($0, /[^\/]*\/([A-Z0-9] -[A-Z0-9] )/, a) {print a[1]}'
If the desired output is always the first match where the character class with a hyphen matches:
echo "home/JOHNSMITH-4991-common-task-list" | awk -v FPAT="[A-Z0-9] -[A-Z0-9] " '$0=$1'
Output
JOHNSMITH-4991
CodePudding user response:
Assumptions:
- could be more than one fwd slash in string
- (after the last fwd slash) there are 2 or more hyphens in the string
- desired output is between last fwd slash and 2nd hyphen
One idea using parameter substitutions:
$ string='home/dir/JOHNSMITH-4991-common-task-list'
$ string1="${string##*/}"
$ typeset -p string1
declare -- string1="JOHNSMITH-4991-common-task-list"
$ string1="${string1%%-*}"
$ typeset -p string1
declare -- string1="JOHNSMITH"
$ string2="${string#*-}"
$ typeset -p string2
declare -- string2="4991-common-task-list"
$ string2="${string2%%-*}"
$ typeset -p string2
declare -- string2="4991"
$ newstring="${string1}-${string2}"
$ echo "${newstring}"
JOHNSMITH-4991
NOTES:
typeset
commands added solely to show progression of values- a bit of typing but if doing this a lot of times in
bash
the overall performance should be good compared to other solutions that require spawning a sub-process - if there's a need to parse a large number of strings best performance will come from streaming all strings at once (via a file?) to one of the other solutions (eg, a single
awk
call that processes all strings will be faster than running the set of strings through abash
loop and performing all of these parameter substitutions)