How can I get only the middle part of a combined name with PCRE regex?
name: 211103_TV_storyname_TYPE
result: storyname
I have used this single line: .(\d) .(_TV_)
to remove the first part: 211103_TV_
Another idea is to use (_TYPE)$
but the problem is that I don´t have in all variations of names a space to declare a second word to use the ^ for the first word and $ for the second.
The variation of the combined name is fix for _TYPE and the TV. The numbers are changing according to the date. And the storyname is variable. Any ideas?
Thanks
CodePudding user response:
You could match as least as possible chars after _TV_
until you match _TYPE
\d_TV_\K.*?(?=_TYPE)
\d_TV_
Match a digit and_TV_
\K
Forget what is matched until now.*?
Match as least as possible characters(?=_TYPE)
Assert _TYPE to the right
Another option without a non greedy quantifier, and leaving out the digit at the start:
_TV_\K[^_]* (?>_(?!TYPE)[^_]*)*(?=_TYPE)
_TV_
Match literally\K[^_]*
Forget what is matched until now and optionally match any char except_
(?>_(?!TYPE)[^_]*)*
Only allow matching_
when not directly followed by TYPE(?=_TYPE)
Assert _TYPE to the right
Edit
If you want to replace the 2 parts, you can use an alternation and replace with an empty string.
If it should be at the start and the end of the string, you can prepend ^
and append $
to the pattern.
\b\d{6}_TV_|_TYPE\b
\b\d{6}_TV_
A word boundary, match 6 digits and_TV_
|
Or_TYPE\b
Match_TYPE
followed by a word boundary
CodePudding user response:
With your shown samples, please try following regex, this creates one capturing group which contains matched values in it.
.*?_TV_([^_]*)(?=_TYPE)
OR(adding a small variation of above solution with fourth bird's nice suggestion), following is without lazy match .*?
unlike above:
_TV_([^_]*)(?=_TYPE)
Here is the Online demo for above regex
Explanation: Adding detailed explanation for above.
.*?_ ##Using Lazy match to match till 1st occurrence of _ here.
TV_ ##Matching TV_ here.
([^_]*) ##Creating 1st capturing group which has everything before next occurrence of _ here.
(?=_TYPE) ##Making sure previous values are followed by _TYPE here.