Home > OS >  Regex replace not removing characters properly
Regex replace not removing characters properly

Time:02-24

I have the regular expression:

const regex = /^\d*\.?\d{0,2}$/

and its inverse (I believe) of

const inverse = /^(?!\d*\.?\d{0,2}$)/

The first regex is validating the string fits any positive number, allowing a decimal and two decimal digits (e.g. 150, 14., 7.4, 12.68). The second regex is the inverse of the first, and doing some testing I'm fairly confident it's giving the expected result, as it only validates when the string is anything but a number that may have a decimal and two digits after (e.g. 12..05, a5, 54.357).

My goal is to remove any characters from the string that do not fit the first regex. I thought I could do that this way:

let myString = '123M.45';
let fixed = myString.replace(inverse, '');

But this does not work as intended. To debug, I tried having the replace character changed to something I would be able to see:

let fixed = myString.replace(inverse, 'ZZZ');

When I do this, fixed becomes: ZZZ123M.45

Any help would be greatly appreciated.

CodePudding user response:

I don't think you can do this

remove any characters from the string that do not fit the first regex

Because regex matching is meant for the entire string, and replace is used to replace just a PART inside that string. So the Regex inside replace must be a Regex to match unwanted characters only, not inverted Regex.

What you could do is to validate the string with your original regex, then if it's not valid, replace and validate again.

//if (notValid), replace unwanted character
// replace everything that's not a dot or digit
const replaceRegex =  /[^\d.]/g; // notice g flag here to match every occurrence
const myString = '123M.45';
const fixed = myString.replace(replaceRegex, '');

console.log(fixed)
// validate again

CodePudding user response:

I think I understand your logic here trying to find a regex that is the inverse of the regex that matches your valid string, in the hopes that it will allow you to remove any characters that make your string invalid and leave only the valid string. However, I don't think replace() will allow you to solve your problem in this way. From the MDN docs:

The replace() method returns a new string with some or all matches of a pattern replaced by a replacement.

In your inverse pattern you are using a negative lookahead. If we take a simple example of X(?!Y) we can think of this as "match X if not followed by Y". In your pattern your "X" is ^ and your "Y" is \d*\.?\d{0,2}$. From my understanding, the reason you are getting ZZZ123M.45 is that it is finding the first ^ (i.e, the start of the string) that is not followed by your pattern \d*\.?\d{0,2}$, and since 123M.45 doesn't match your "Y" pattern, your negative lookahead is satisfied and the beginning of your string is matched and "replaced" with ZZZ.

That (I think) is an explanation of what you are seeing.

I would propose an alternative solution to your problem that better fits with how I understand the .replace() method. Instead of your inverse pattern, try this one:

const invalidChars = /[^\d\.]|\.(?=\.)|(?<=\.\d\d)\d*/g
const myString = '123M..456444';
const fixed = myString.replace(invalidChars, '');

Here I am using a pattern that I think will match the individual characters that you want to remove. Let's break down what this one is doing:

[^\d\.]: match characters that are not digits

\.(?=\.): match . character if it is followed by another . character.

(?<=\.\d\d)\d*: match digits that are preceded by a decimal and 2 digits

Then I join all these with ORs (|) so it will match any one of the above patterns, and I use the g flag so that it will replace all the matches, not just the first one.

I am not sure if this will cover all your use cases, but I thought I would give it a shot. Here's a link to a breakdown that might be more helpful than mine, and you can use this tool to tweak the pattern if necessary.

  • Related