I'm pretty bad at Regex (C#), so I break things down into parts. The goal of the following series of Regex statements is to take an arbitrary string and reduce it to lower case of the format "this is a test of 4mg/cc susp".
This is what I've been doing:
// Test string
string str1 = @" This is\ 'a' test of 4mg/cc susp ";
// Remove special characters except for space and /
str1 = Regex.Replace(str1, @"[^0-9a-zA-Z /] ", "");
// Remove all but one space from within the string. Trim the ends.
str1 = Regex.Replace(str1.Trim(), @"\s ", " ");
// Convert all to lower case
str1 = str1.ToLower();
Is there a single Regex (C#) statement that can accomplish all the above?
CodePudding user response:
I would argue that trying to combine both patterns into one would make it less readable. You could keep using two calls to Regex.Replace()
and just append .ToLower()
to the second one:
// Remove special characters except for space and /
str1 = Regex.Replace(str1, @"[^0-9a-zA-Z /] ", "");
// Remove all but one space, trim the ends, and convert to lower case.
str1 = Regex.Replace(str1.Trim(), @"\s ", " ").ToLower();
// ^^^^^^^^^
That said, if you really have to use a one-liner, you could write something like this:
str1 = Regex.Replace(str1, @"[^A-Za-z0-9 /] |( ) ", "$1").Trim().ToLower();
This matches any character not present in the negated character class or one or more space characters, placing the space character in a capturing group, and replaces each match with what was captured in group 1 (i.e., nothing or a single space character).
For the sake of completeness, if you want to also handle the trimming with regex (and make the pattern even less readable), you could:
str1 = Regex.Replace(str1, @"[^A-Za-z0-9 /] |^ | $|( ) ", "$1").ToLower();