Home > Back-end >  C# Regex Anomaly
C# Regex Anomaly

Time:08-29

I'm a bit perplexed here.

I have a Regex which is to limit decimal places to two points. My second and third captures work as expected. But including the 1st capture ($1) corrupts the string and includes all the decimal places (I get the original string).

var t = "553.17765";
var from = @"(\d )(\.*)(\d{0,2})";
var to = "$1$2$3";
var rd = Regex.Replace(t, from,to);
var r = Regex.Match(t, from);

Why can't I get the 553 in the $1 variable? LinqPad

CodePudding user response:

What is happening is that you are matching the number multiple times, once before the . and once after. You could work around that by looking for the longest match, but it seems you could improve your Regex instead

(\d \.?\d{0,2})

Steps are as follows

  • The capture group covers the whole number at once.
  • Look for digits, greedy match.
  • Look for a decimal point, either one or none.
  • Look for zero to two digits

Furthermore, if you want to replace using Regex.Replace you need something to match the rest of the string.

text = Regex.Replace(text, @". ?(\d \.?\d{0,2}). ", "$1");

dotnetfiddle

CodePudding user response:

Your example does not work because it triggers twice per definition. The statement (\d )(\.*)(\d{0,2}) will split the string 553.17765 as follows:

Match 1: 533.17
    $1 = 553
    $2 = .
    $3 = 17
    Replace 533.17 with 553.17
Match 2: 765
    $1 = 765
    Replace 765 with 765

The first match includes - as expected - only two of the decimal places. With this action, the match is complete and the regex starts looking for the next match, because Replace replaces all matches, not the first one only. As you can see, this regex does nothing by design.

The way replace works btw. is to find a match and replace the whole match with the replace pattern. So no need to include the surrounding text. The problem is, that your regex matches too well. It only matches the first two decimal places. Therefore the match only includes the first two decimal places.

That means that whatever you will replace that with, will only replace 553.17 and nothing more. For finding decimal numbers this is good. For replacing not so much, here you want to find the whole number with all decimal places and then replace it.

So a working replace regex would look like this: (\d \.\d{1,2})\d*. First there is only one capture group, as we don't intend to change the order of numbers around. Second, the point is required as we are only interested to replace numbers that actually have decimal places. Same reason we need at least one, up to two, decimal places. Every decimal place after that is optional, but will be captured greedily to give the whole number to the match so it will be replaced completely.

Match 1: 533.17765
    $1 = 533.17
    Replace 533.17765 with 533.17

This regex does not handle thousands-separators btw, if that is required.

  • Related