I'm looking to apply a regular expression to an input string.
Regular expression:(.*)\\(.*)_(.*)_(.*)-([0-9]{4}).*
Test entries:
- Parkman\L9\B137598_00_T-3298-B
- Parkman\L9\B137598_00_T-3298
The result should be B137598_00_T-3298 for both test entries. The problem is that if I add 4 digits in the test entries, the result will be, for example, B137598_00_T-3298-5555.
What I need here is that anything after the 3298 should not be taken into account. What are the changes that I can perform to make that possible
CodePudding user response:
You can use a single capture group with a bit more specific pattern:
\w\\\w \\((?:[^\W_] _){2}[^\W_] -[0-9]{4})\b
The pattern matches:
\w
Match a single word char\\\w \\
Match 1 word chars between backslashes(
Capture group 1(?:[^\W_] _){2}
Repeat 2 times word chars without_
followed by a single_
[^\W_] -
Match 1 word chars without_
and then-
-[0-9]{4}
Match-
and 4 digits
)
Close group 1\b
A word boundary
Or a bit broader pattern with a match only, where \w
also matches an underscore, and asserting \
to the left:
(?<=\\)\w -[0-9]{4}\b
CodePudding user response:
c# code:
string s1 = @"Parkman\\L9\\B137598_00_T-3298-B";
string s2 = @"Parkman\L9\B137598_00_T-3298";
string pattern = @"\w _[0-9]{2}_T-[0-9]{4}";
var match = Regex.Matches( s1, pattern);
Console.WriteLine("s1: {0}", match[0]);
match = Regex.Matches(s2, pattern);
Console.WriteLine("s2: {0}" , match[0]);
then the result:
s1: B137598_00_T-3298
s2: B137598_00_T-3298