Home > Mobile >  Get values from a string based on a format
Get values from a string based on a format

Time:10-18

I am trying to get some individual values from a string based on a format, now this format can change so ideally, I want to specify this using another string.

For example let's say my input is 1. Line One - Part Two (Optional Third Part) I would want to specify the format as to match so %number%. %first% - %second% (%third%) and then I want these values as variables.

Now the only way I could think of doing this was using RegEx groups and I have very nearly got RegEx works.

var input = "1. Line One - Part Two (Optional Third Part)";

var formatString = "%number%. %first% - %second% (%third%)";
    
var expression = new Regex("(?<Number>[^.] ). (?<First>[^-] ) - (?<Second>[^\\(] ) ((?<Third>[^)] ))");
    
var match = expression.Match(input);
    
Console.WriteLine(match.Groups["Number"].ToString().Trim());
Console.WriteLine(match.Groups["First"].ToString().Trim());
Console.WriteLine(match.Groups["Second"].ToString().Trim());
Console.WriteLine(match.Groups["Third"].ToString().Trim());

This results in the following output, so all good apart from that opening bracket.

1 Line One Part Two (Optional Third Part

I'm now a bit lost as to how I could translate my format string into a regular expression, now there are no rules on this format, but it would need to be fairly easy for a user.

Any advice is greatly appreciated, or perhaps there is another way not involving Regex?

CodePudding user response:

You may use this regex:

^(?<Number>[^.] )\. (?<First>[^-] ) - (?<Second>[^(] )(?: \((?<Third>[^)] )\))?$

enter image description here

If you want to keep you syntax, you can leverage Regex.Escape method. I also written some code that parses all parameters within %

using System.Text.RegularExpressions;

var input = "1. Line One - Part Two (Optional Third Part)";

var formatString = "%number%. %first% - %second% (%third%)";

formatString = Regex.Escape(formatString);

var parameters = new List<string>();
formatString = Regex.Replace(formatString, "%([^%] )%", match =>
{
    var paramName = match.Groups[1].Value;
    var groupPattern = "(?<"   paramName   ">{"   parameters.Count   "})";
    parameters.Add(paramName);
    return groupPattern;
});

var pattern = string.Format(
    formatString, 
    "[^\\.] ", 
    "[^\\-] ", 
    "[^\\(] ", 
    "[^\\)] ");

var match = Regex.Match(input, pattern);

foreach (var paramName in parameters)
{
    Console.WriteLine(match.Groups[paramName]);
}

Further notes

You need to adjust part where you specify pattern for each group, currently it's not generic and does not care about how many paramters there would be.

So finally, taking it all into account and cleaning up the code a little, you can use such solution:

public static class FormatBasedCustomRegex
{
    public static string GetPattern(this string formatString,
        string[] subpatterns,
        out string[] parameters)
    {
        formatString = Regex.Escape(formatString);

        formatString = formatString.ReplaceParams(out var @params);

        if(@params.Length != subpatterns.Length)
        {
            throw new InvalidOperationException();
        }

        parameters = @params;

        return string.Format(
            formatString,
            subpatterns);
    }

    private static string ReplaceParams(
        this string formatString, 
        out string[] parameters)
    {
        var @params = new List<string>();
        var outputPattern = Regex.Replace(formatString, "%([^%] )%", match =>
        {
            var paramName = match.Groups[1].Value;
            var groupPattern = "(?<"   paramName   ">{"   @params.Count   "})";
            @params.Add(paramName);
            return groupPattern;
        });

        parameters = @params.ToArray();

        return outputPattern;
    }
}

and main method would look like:


var input = "1. Line One - Part Two (Optional Third Part)";

var pattern = "%number%. %first% - %second% (%third%)".GetPattern(
    new[] 
    {
        "[^\\.] ",
        "[^\\-] ",
        "[^\\(] ",
        "[^\\)] ",
    },
    out var parameters);

var match = Regex.Match(input, pattern);

foreach (var paramName in parameters)
{
    Console.WriteLine(match.Groups[paramName]);
}

But it's up to you how would you define particular methods and what signatures they should have for you to have the best code :)

CodePudding user response:

Your format contains special characters that are becoming part of the regular expression. You can use the Regex.Escape method to handle that. After that, you can just use a Regex.Replace with a delegate to transform the format into a regular expression:

var input = "1. Line One - Part Two (Optional Third Part)";
var fmt = "%number%. %first% - %second% (%third%)";

var templateRE = new Regex(@"%([a-z] )%", RegexOptions.Compiled);
var pattern = templateRE.Replace(Regex.Escape(fmt), m => $"(?<{m.Groups[1].Value}>. ?)");

var ansRE = new Regex(pattern);
var ans = ansRE.Match(input);

Note: You may want to place ^ and $ at the beginning and end of the pattern respectively, to ensure the format must match the entire input string.

  • Related