Sed: how do I take only the first occurance of a hyphen?-CodePudding

I have a string for example 'home/JOHNSMITH-4991-common-task-list' and I want to take out the uppercase part and the numbers with the hyphen between them. I echo the string and pipe it to sed like so but I keep getting all the hyphens I don't want e.g.

echo home/JOHNSMITH-4991-common-task-list | sed 's/[^A-Z0-9-]//g'

gives me:

JOHNSMITH-4991---

I need:

JOHNSMITH-4991

How do I ignore all but the first hyphen?

Thanks.

CodePudding user response：

You can use

sed 's,.*/\([^-]*-[^-]*\).*,\1,'

POSIX BRE regex details:

.* - any zero or more chars
/ - a / char
\([^-]*-[^-]*\) - Group 1: any zero or more chars other than -, a hyphen, and then again zero or more chars other than -
.* - any zero or more chars

The replacement is the Group 1 placeholder, \1, to restore just the text captured.

See the online demo:

#!/bin/bash
s="home/JOHNSMITH-4991-common-task-list"
sed 's,.*/\([^-]*-[^-]*\).*,\1,' <<< "$s"
# => JOHNSMITH-4991

CodePudding user response：

1st solution: With awk it will be much easier and we could keep it simple, please try following, written and tested with your shown samples.

echo "echo home/JOHNSMITH-4991-common-task-list" | awk -F'/|-' '{print $2"-"$3}'

Explanation: Simple explanation would be, setting field separator as / OR - and printing 2nd field - and 3rd field of current line.

2nd solution: Using match function of awk program here.

echo "echo home/JOHNSMITH-4991-common-task-list" | 
awk '
match($0,/\/[^-]*-[^-]*/){
  print substr($0,RSTART 1,RLENGTH-1)
}'

3rd solution: Using GNU grep solution here. Using -oP option of grep here, to print matched values with o option and to enable ERE(extended regular expression) with P option. Then in main program of grep using .*/ followed by \K to ignore previous matched part and then mentioning [^-]*-[^-]* to make sure to get values just before 2nd occurrence of - in matched line.

echo "echo home/JOHNSMITH-4991-common-task-list" | grep -oP '.*/\K[^-]*-[^-]*'

CodePudding user response：

Here is a simple alternative solution using cut with bash string substitution:

s='home/JOHNSMITH-4991-common-task-list'
cut -d- -f1-2 <<< "${s##*/}"

JOHNSMITH-4991

CodePudding user response：

You could match until the first occurrence of the /, then clear the match buffer with \K and then repeat the character class 1 times with a hyphen in between to select at least characters before and after the hyphen.

[^/]*/\K[A-Z0-9] -[A-Z0-9]

If supported, using gnu grep:

echo "echo home/JOHNSMITH-4991-common-task-list" | grep -oP '[^/]*/\K[A-Z0-9] -[A-Z0-9] '

Output

JOHNSMITH-4991

If gnu awk is an option, using the same pattern but with a capture group:

echo "home/JOHNSMITH-4991-common-task-list" | awk 'match($0, /[^\/]*\/([A-Z0-9] -[A-Z0-9] )/, a) {print a[1]}'

If the desired output is always the first match where the character class with a hyphen matches:

echo "home/JOHNSMITH-4991-common-task-list" | awk -v FPAT="[A-Z0-9] -[A-Z0-9] " '$0=$1'

Output

JOHNSMITH-4991

CodePudding user response：

Assumptions:

could be more than one fwd slash in string
(after the last fwd slash) there are 2 or more hyphens in the string
desired output is between last fwd slash and 2nd hyphen

One idea using parameter substitutions:

$ string='home/dir/JOHNSMITH-4991-common-task-list'

$ string1="${string##*/}"
$ typeset -p string1
declare -- string1="JOHNSMITH-4991-common-task-list"

$ string1="${string1%%-*}"
$ typeset -p string1
declare -- string1="JOHNSMITH"

$ string2="${string#*-}"
$ typeset -p string2
declare -- string2="4991-common-task-list"

$ string2="${string2%%-*}"
$ typeset -p string2
declare -- string2="4991"

$ newstring="${string1}-${string2}"
$ echo "${newstring}"
JOHNSMITH-4991

NOTES:

typeset commands added solely to show progression of values
a bit of typing but if doing this a lot of times in bash the overall performance should be good compared to other solutions that require spawning a sub-process
if there's a need to parse a large number of strings best performance will come from streaming all strings at once (via a file?) to one of the other solutions (eg, a single awk call that processes all strings will be faster than running the set of strings through a bash loop and performing all of these parameter substitutions)