I need a Regex expression that matches the words or phrases in an English language list of things, in one of these forms:
- "Some words"
would match "Some words" - "Some words and some other words"
would match "Some words" and "some other words" - "Some words, more words and some other words"
would match "Some words", "more words", and "some other words" - "Some words, more words, and some other words"
would match "Some words", "more words", and "some other words"
In other words, the Regex allows me to identify each phrase in an English language list of phrases, where all but the final phrase (if there are more than two phrases) are separated by commas, and the final "and" may or may not be preceded by a comma.
Getting the comma-separated matches is easy:
[^,]
but I can't figure out how to deal with the optional final "and" separator (without a preceding comma).
CodePudding user response:
One way to do this is to split the string on and
(optionally preceded by a comma) or comma:
string[] inp = new string[] {
"Some words",
"Some words and some other words",
"Some words, more words and some other words",
"Some words, more words, and some other words"
};
foreach (string s in inp) {
string[] phrases = (Regex.Split(s, @"(?:,\s*|\s )and\s |,\s*"));
Console.WriteLine(string.Join("\n", phrases));
}
Output:
Some words
Some words
some other words
Some words
more words
some other words
Some words
more words
some other words
CodePudding user response:
You can use the following pattern in Regex.Split
:
\s*(?:(?:,\s*)?\band\s |,\s*)
See the regex demo.
Details:
\s*
- zero or more whitespaces(?:(?:,\s*)?\band\s |,\s*)
- one of the two alternatives:(?:,\s*)?\band\s
- an optional sequence of a comma and zero or more whitespaces and then a whole wordand
with one or more whitespace chars right after|
- or,\s*
- a comma and zero or more whitespaces.
See the C# demo:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var texts = new List<string> {
"Some words",
"Some words and some other words",
"Some words, more words and some other words",
"Some words, more words, and some other words"
};
var pattern = @"\s*(?:(?:,\s*)?\band\s |,\s*)";
foreach (var text in texts)
{
var result = Regex.Split(text, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
Console.WriteLine("'{0}' => ['{1}']", text, string.Join("', '", result));
}
}
}
Output:
'Some words' => ['Some words']
'Some words and some other words' => ['Some words', 'some other words']
'Some words, more words and some other words' => ['Some words', 'more words', 'some other words']
'Some words, more words, and some other words' => ['Some words', 'more words', 'some other words']
CodePudding user response:
You can try it
[some|Some|more] \s(?:[a-z] )?\s?words
Hope it will help you!