Home > database >  Regular expression : Excluding last part
Regular expression : Excluding last part

Time:10-19

I'm looking to apply a regular expression to an input string.

Regular expression:(.*)\\(.*)_(.*)_(.*)-([0-9]{4}).*

Test entries:

  • Parkman\L9\B137598_00_T-3298-B
  • Parkman\L9\B137598_00_T-3298

The result should be B137598_00_T-3298 for both test entries. The problem is that if I add 4 digits in the test entries, the result will be, for example, B137598_00_T-3298-5555.

What I need here is that anything after the 3298 should not be taken into account. What are the changes that I can perform to make that possible

CodePudding user response:

You can use a single capture group with a bit more specific pattern:

\w\\\w \\((?:[^\W_] _){2}[^\W_] -[0-9]{4})\b

The pattern matches:

  • \w Match a single word char
  • \\\w \\ Match 1 word chars between backslashes
  • ( Capture group 1
    • (?:[^\W_] _){2} Repeat 2 times word chars without _ followed by a single _
    • [^\W_] - Match 1 word chars without _ and then -
    • -[0-9]{4} Match - and 4 digits
  • ) Close group 1
  • \b A word boundary

Regex demo

Or a bit broader pattern with a match only, where \w also matches an underscore, and asserting \ to the left:

(?<=\\)\w -[0-9]{4}\b

Regex demo

CodePudding user response:

c# code:

        string s1 = @"Parkman\\L9\\B137598_00_T-3298-B";
        string s2 = @"Parkman\L9\B137598_00_T-3298";
        string pattern = @"\w _[0-9]{2}_T-[0-9]{4}";

        var match = Regex.Matches( s1, pattern);
        Console.WriteLine("s1: {0}", match[0]);

        match = Regex.Matches(s2, pattern);
        Console.WriteLine("s2: {0}" , match[0]);

then the result:

s1: B137598_00_T-3298

s2: B137598_00_T-3298

  • Related