I have a string, it's a long text, the words are separated by white spaces (as usually).
But it looks like people love to use different characters for their white spaces in the same text !
Look at this screenshot, you will see regular white spaces (Hex Code 0020) and you will see another type of white space in yellow (Hex Code 00A0)
Knowing that strings can have much more than 2 ways of "writing" white spaces. How can I unify (like a string replace) to a single white space?
In other words : "Replace all different white spaces by Hex Code 0020 whitespace"
CodePudding user response:
Several ways to choose from:
Regular expressions:
using System.Text.RegularExpressions;
...
// If you want to remove double spaces as well: "A B C" -> "A B C"
string result = Regex.Replace(text, @"\s ", " ");
Or
using System.Text.RegularExpressions;
...
// If you want to preserve double spaces
string result = Regex.Replace(text, @"\s", " ");
Linq:
using System.Linq;
...
string result = string.Concat(text.Select(c => char.IsWhiteSpace(c) ? ' ' : c));
Loop:
StringBuilder sb = new StringBuilder(text.Length);
foreach (char c in text)
sb.Append(char.IsWhiteSpace(c) ? ' ' : c);
string result = sb.ToString();
CodePudding user response:
You can use a StringBuilder which lets you index and replace single characters in-place. Char.IsWhiteSpace considers a whole lot of different white space characters.
var sb = new StringBuilder(text);
for (int i = 0; i < sb.Length; i ) {
if (Char.IsWhiteSpace(sb[i])) {
sb[i] = ' ';
}
}
text = sb.ToString();
Note that you cannot do this in strings directly, as strings are immutable.