Home > OS >  Regex split preserving strings and escape character
Regex split preserving strings and escape character

Time:11-01

I need to split a string on C#, based on space as delimiter and preserving the quotes.. this part is ok. But additionally, I want to allow escape character for string \" to allow include other quotes inside the quotes.

Example of what I need:

One Two "Three Four" "Five \"Six\""

To:

  • One
  • Two
  • Three Four
  • Five "Six"

This is the regex I am currently using, it is working for all the cases except "Five \"Six\""

//Split on spaces unless in quotes
        List<string> matches = Regex.Matches(input, @"[\""]. ?[\""]|[^ ] ")
            .Cast<Match>()
            .Select(x => x.Value.Trim('"'))
            .ToList();

I'm looking for any Regex, that would do the trick.

CodePudding user response:

You can use

var input = "One Two \"Three Four\" \"Five \\\"Six\\\"\"";
// Console.WriteLine(input); // => One Two "Three Four" "Five \"Six\""
List<string> matches = Regex.Matches(input, @"(?s)""(?<r>[^""\\]*(?:\\.[^""\\]*)*)""|(?<r>\S )")
            .Cast<Match>()
            .Select(x => Regex.Replace(x.Groups["r"].Value, @"\\(.)", "$1"))
            .ToList();
foreach (var s in matches)
    Console.WriteLine(s);

See the C# demo.

The result is

One
Two
Three Four
Five "Six"

The (?s)"(?<r>[^"\\]*(?:\\.[^"\\]*)*)"|(?<r>\S ) regex matches

  • (?s) - a RegexOptions.Singleline equivalent to make . match newlines, too
  • "(?<r>[^"\\]*(?:\\.[^"\\]*)*)" - ", then Group "r" capturing any zero or more chars other than " and \ and then zero or more sequences of any escaped char and zero or more chars other than " and \, and then a " is matched
  • | - or
  • (?<r>\S ) - Group "r": one or more whitespaces.

The .Select(x => Regex.Replace(x.Groups["r"].Value, @"\\(.)", "$1")) takes the Group "r" value and unescapes (deletes a \ before) all escaped chars.

  • Related