Home > Net >  Why Regex in a while loop will match only the first occurrence length (is not dynamic in a while loo
Why Regex in a while loop will match only the first occurrence length (is not dynamic in a while loo

Time:12-26

I have a regex which I would imagine dynamically captures my group of zeros. What happens is I get a list full of e.g. [00, 00, 00, 00, 00] from a string like "001111110000001100110011111"

I've tried putting my var regex = new Regex() inside the while loop in hopes this might solve my problem. Whatever I try, regex returns only the first occurrences' length of zeros instead of filling my collection with varying zeros amounts.

List<string> ZerosMatch(string input)
{
    var newInput = input;
    var list = new List<string>();
    var regex = new Regex(@"[0]{1,}");
    var matches = regex.Match(newInput);

    while (matches.Success)
    {
        list.Add(matches.Value);

        try 
        {
            newInput = newInput.Remove(0, matches.Index);
        }
        catch
        {
            break;
        }                                      
    }
    return list;
}

vs

List<string> ZerosMatch(string input)
{
    var newInput = input;
    var list = new List<string>();
    bool hasMatch = true;

    while (hasMatch)
    {
        try 
        {
            var regex = new Regex(@"[0]{1,}");
            var matches = regex.Match(newInput);
            newInput = newInput.Remove(0, matches.Index);
            list.Add(matches.Value);
            hasMatch = matches.Success;
        }
        catch
        {
            break;
        }                                      
    }
    return list;
}

My question is Why is this happening ?

CodePudding user response:

In your first approach, you are only executing regex.Match once, so you are always looking at the very same match until your code throws an Exception. Depending on whether your first match is at index 0 or later, it's an OutOfBounds exception (because you try to remove from an empty string) or an OutOfMemory exception (because you are removing nothing from your string but adding to your result list indefinitively.

Your second approach will suffer from the same OutOfMemory exception if your input starts with a 0 or you arrive at some intermediate result string which starts with 0

See below for a working approach:

List<string> ZerosMatch(string input)
{
    var newInput = input;
    var list = new List<string>();
    var regex = new Regex(@"[0]{1,}");
    var match = regex.Match(newInput);
    while (match.Success)
    {
        newInput = newInput.Remove(match.Index, match.Value.Length);
        list.Add(match.Value);
        match = regex.Match(newInput);
    }
    return list;
}

Still, using Regex.Matches is the recommended approach, if you want to extract multiple instances of a match from a string ...

CodePudding user response:

        var newInput = input;   //The newInput variable is not needed and you can proceed with input
        var list = new List<string>();
        var regex = new Regex(@"[0]{1,}");
        var matches = regex.Matches(newInput);

        for(int i=0; i<matches.Count; i  )
        {
            list.Add(matches[i].Value);
        }
        return list;

CodePudding user response:

I suggest using Matches instead of Match and query with a help of Linq (why should we loop, search again when we can get all the matches in one go):

using Sysem.Linq;

...

static List<string> ZeroesMatch(string input) => Regex
  .Matches(input ?? "", "0 ")
  .Cast<Match>()
  .Select(match => match.Value)
  .ToList();

Here I've simplified pattern into 0 (one or more 0 chars) and added ?? "" to avoid exception on null string

  • Related