Home > database >  Regex for string to split with period(Dot) that ending with either number or alphabets
Regex for string to split with period(Dot) that ending with either number or alphabets

Time:07-29

I have display code in some of strings which may or may not presents. If string contains display code(Either decimal number or Alphabets), split the string and capture code and title separately.

Note: Examples are for reference ,actual string does not starts with title word

Sample Strings:

1. Title1 Contains Space and Characters
2. Title2 Contains Space and Characters
2.1. Title2.1 Contains Space and Characters
2.1.1.  Title2.1.1 Contains Space and Characters
3.  Title3 Contains Space and Characters
3.1.  Title3.1 Contains Space and Characters
A.  Title A Contains Space and Characters
B.  Title B Contains Space and Characters
C.  Title C Contains Space and Characters
Title which does not contains any Display Code

Output:

1 - Title1 Contains Space and Characters

2 - Title2 Contains Space and Characters

2.1 - Title2.1 Contains Space and Characters

2.1.1 - Title2.1.1 Contains Space and Characters

3 - Title3 Contains Space and Characters

3.1 - Title3.1 Contains Space and Characters

A - Title A Contains Space and Characters

B - Title B Contains Space and Characters

C. - Title C Contains Space and Characters

Title which does not contains any Display Code

Code: I am trying to split string with dot . to provide output like as shown above.

string Title = "1. Title1 Contains Space and Characters";
string displayCode = String.Empty;
if (Char.IsDigit(secTitle[0]) || Title.Substring(0,2).Contains('.'))
{
    displayCode = Title .Split(new[] { ' ' }, 2)[0].TrimEnd('.');
    Title = Title .Split(new[] { ' ' }, 2)[1];
}
Console.WriteLine($" Display Code: {displayCode} Title: {Title }");

How to achieve this using split without substring?

How to achieve this using regex?

What is efficient way for better and fast result?

CodePudding user response:

With regex maybe something like this ^(?<displayCode>[\w.] )\.\ (?<title>.*)|^ demo

CodePudding user response:

Try following :

           string[] inputs = {
                "1. Title1 Contains Space and Characters",
                "2. Title2 Contains Space and Characters",
                "2.1. Title2.1 Contains Space and Characters",
                "2.1.1.  Title2.1.1 Contains Space and Characters",
                "3.  Title3 Contains Space and Characters",
                "3.1.  Title3.1 Contains Space and Characters",
                "A.  Title A Contains Space and Characters",
                "B.  Title B Contains Space and Characters",
                "C.  Title C Contains Space and Characters",
                "Title which does not contains any Display Code"
            };
            string pattern = @"^(?'head'[^\s] )\s (?'tail'.*)";

            foreach(string input in inputs)
            {
                if(input.Contains("."))
                {
                    Match match = Regex.Match(input, pattern);
                    Console.WriteLine("{0} - {1}", match.Groups["head"].Value, match.Groups["tail"].Value);
                }
                else
                {
                    Console.WriteLine(input);
                }
            }
            Console.ReadLine();

CodePudding user response:

I suggest matching instead of splitting:

private static (string title, string text) MySplit(string value) {
  if (string.IsNullOrEmpty(value))
    return ("", "");

  var match = Regex.Match(
     value, 
    @"^(?<title>([0-9] |[A-Za-z])(\.([0-9] |[A-Za-z]))*\.)\s*(?<text>.*)$");

  return match.Success
    ? (match.Groups["title"].Value, match.Groups["text"].Value)
    : ("", value);
 }

Demo:

  string[] tests = new string[] {
    "1. Title1 Contains Space and Characters",
    "2. Title2 Contains Space and Characters",
    "2.1. Title2.1 Contains Space and Characters",
    "2.1.1.  Title2.1.1 Contains Space and Characters",
    "3.  Title3 Contains Space and Characters",
    "3.1.  Title3.1 Contains Space and Characters",
    "A.  Title A Contains Space and Characters",
    "B.  Title B Contains Space and Characters",
    "C.  Title C Contains Space and Characters",
    "Title which does not contains any Display Code",
    "12.345. Long code",
    "1.A.2. Combined code",
    "12.a.4.3.B. Complex display code",
    "Test. No code - 'Test.' is not a display code",
  };

  string report = string.Join(Environment.NewLine, tests
    .Select(test => MySplit(test))
    .Select(pair => $"{pair.title,11} :: {pair.text}"));

  Console.Write(report);

Output:

         1. :: Title1 Contains Space and Characters
         2. :: Title2 Contains Space and Characters
       2.1. :: Title2.1 Contains Space and Characters
     2.1.1. :: Title2.1.1 Contains Space and Characters
         3. :: Title3 Contains Space and Characters
       3.1. :: Title3.1 Contains Space and Characters
         A. :: Title A Contains Space and Characters
         B. :: Title B Contains Space and Characters
         C. :: Title C Contains Space and Characters
            :: Title which does not contains any Display Code
    12.345. :: Long code
     1.A.2. :: Combined code
12.a.4.3.B. :: Complex display code
            :: Test. No code - 'Test.' is not a display code

CodePudding user response:

You could split on a dot followed by 1 whitespace chars, while asserting the display code to the left:

(?<=^(?:\d (?:\.\d )*|[A-Z] ))\.\s 

Explanation

  • (?<= Positive lookbehind
    • ^ Start of string
    • (?: Non capture gorup
      • \d (?:\.\d )* 1 digits with optional decimal parts
      • | Or
      • [A-Z] Match 1 uppercase chars
    • ) Close non capture group
  • ) Close lookbehind
  • \.\s Match a dot and 1 whitespace chars to split on

Regex demo | C# demo

Or you could use 2 capture groups to capture the display code and the rest of the line:

^((?:\d (?:\.\d )*|[A-Z] ))\.\s (. )

Explanation

  • ^ Start of string
  • ( Capture group 1
    • (?:\d (?:\.\d )*|[A-Z] ) Match either 1 digits with optional decimal parts or 1 chars A-Z
  • ) Close group 1
  • \.\s Match a dot and 1 whitespace chars
  • (. ) Capture group 2, match the rest of the line

Regex demo | C# demo

  • Related