I need to split my text into each word, space, and new line. Although the words and spaces are properly working, the \n is taking up two spaces only if it's not after a word. Example: "\nTest\nword", here, the first \n takes up two spaces while the second one takes up one.
How would I write the proper regex?
My code:
string delimiterChars = "([ \r\n])";
wordArray = Regex.Split(myTexy, delimiterChars);
For context, I am using Unity.
CodePudding user response:
You can use regular expressions to remove leading delimiter characters.
var myTexy = "\nTest\nword";
string delimiterChars = "([ \r\n])";
myTexy = Regex.Replace(myTexy, "^" delimiterChars, "");
var wordArray = Regex.Split(myTexy, delimiterChars);
The "^" regex option says only look for these characters at the beginning of the string.
Also, just so you are aware the behavior you are seeing is intended and is documented
CodePudding user response:
Regex.Split
will always produce empty items where the matches are consecutive, or when they are at the start/end of string.
Instead, you can use a matching and extracting approach:
string delimiterChars = "[^ \r\n] |[ \r\n]";
string[] wordArray = Regex.Matches(myTexy, delimiterChars)
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
The [^ \r\n] |[ \r\n]
regex matches one or more chars other than a space, CR and LF, or a space, CR or an LF char.