Home > Back-end >  Split a text with multiple strings instead of period mark using regex
Split a text with multiple strings instead of period mark using regex

Time:11-15

I tried this code:

string Input = TextBox1.Text;

string[] splitX = Regex.Split(Input, @"(?<=[|if|and|but|so|when|])");

Often this regular expression is applied @"(?<=[.?!])") to split a text into sentences. But I need to use words as a delimiter to split the text..

CodePudding user response:

It looks like you're trying to use a character set when you should be using a capture group with multiple possible matches. The [] characters indicate a character set which matches any of the enclosed characters. For example, in the other regex you provided, [.?!] matches either ., ?, or ! (though you probably want to escape the period with \. because . will match any character except newline). Thus, your regex is trying to match the characters |, i, f, and so on. I'm not sure what happens if you specify duplicate characters in a character set like you have (two ns and multiple |s), but the point is that this is the wrong regex construct to use.

The solution it simple: replace your square brackets with parenthesis. This turns that section of the regex into a capture group, which matches the contained regex and can have multiple possible matches separated by |. You should also only put the | between matches, so remove the first and last one. The correct regex would be:

(?<=(if|and|but|so|when))

CodePudding user response:

Since the question isn't specifically tagged on RegEx, nor do you specifically say that you need to perform the split within a RegEx operation..

But I need to use words as a delimiter to split the text..

Multiple words can be used as delimiters to identify where you want to split up your string like so:

string[] delimiters = {"if", "and", "but", "so", "when" };
var parts = srcString.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);

So perhaps this approach gets you where you need, or perhaps there is a combination of approaches, (regex first, then apply this string split technique.... )

  • Related