I would like to extract the string between the last two “-“ (T-shirt also has “-“ and it ruins my results.
For example:
T-shirt Layla gaga-papa-lk
So I want to return “papa”. I tried to use (?<=-)[^-)] (?=-)
but it doesn’t work on this example because of the t-shirt.
On this example it does work:
Gh world papa -mama-p
CodePudding user response:
You might update the pattern to:
(?<=-)[^-\n] (?=-[^-\n]*$)
The pattern matches:
(?<=-)
Assert a-
to the left[^-\n]
Match 1 occurrences of any char except-
or a newline(?=-[^-\n]*$)
Assert a-
to the right, followed by any char except-
or a newline until the end of the string
Or using a capture group instead of lookarounds:
-([^-\n] )-[^-\n]*$
The \n
in the negated character class is to not match crossing newlines.
There is no )
in the example data, but if you also don't want to match that you can add it to the negated character class [^-\n)]
CodePudding user response:
You could match the following regular expression:
(?=[^\n-]*\-[^\n-]*$)[^\n-]*
(?=[^\n-]*\-[^\n-]*$)
is a positive lookahead which becomes satisfied when the regex engine has moved its internal string pointer from the beginning of the string to just past the penultimate hyphen. If the string were "a-b-c-d-e"
the lookahead would become satisfied when the pointer were between the third hyphen and the letter "d"
. The desired string therefore consists of the characters between that location and the next (last) hyphen.
The regex can be broken down as follows.
(?= # begin positive lookahead
[^\n-]* # match zero or more characters other than a newline or hyphen
\- # match a hyphen
[^\n-]* # match zero or more characters other than a newline or hyphen
$ # match end of string
) # end positive lookahead
[^\n-]* # match zero or more characters other than a newline or hyphen
Notice the repetition in this expression, with [^\n-]*
appearing three times. If the regex engine supports subroutines (or sub-expressions) the expression can be simplified as follows (I've used the PCRE engine syntax for illustration):
(?=([^\n-]*)\-(?1)$)(?1)
Numbered groups subroutine demo
or
(?=(?P<non_hyphens>[^\n-]*)\-(?P>non_hyphens)$)(?P>non_hyphens)
Subroutines, especially with named groups, make regular expressions more compact, easier to follow and reduce careless coding errors.