I'm working on a side project for which I need to parse String to obtain substrings
I have a REST API containing a String parameter in the payload. This String value's pattern can vary across any of the enlisted patterns:
- [Name]
- [Name 1], [Name 2]
- [Name 1] and [Name 2]
- [Name 1], [Name 2] and [Name 3]
- [Name 1], [Name 2] and [Name 3], [Role]
Options I tried:
Including another parameter in the request payload that describes the format of the String value. For Ex: If a string value of pattern #4 is to be passed as input, here is the payload I would expect:
{
"Value" : "Name 1, Name 2 and Name 3",
"Format": 4
}
Here, it's a burden on the client to determine the format and set the format value accordingly, which is definitely not a good approach
- Somehow determine the format (For Ex: count the number of commas and AND keyword) and accordingly use a Reg-ex dedicated for that format
For Ex: If the string contains at least one comma, an occurrence of the AND keyword and a comma after the AND keyword, it could be pattern #5 (described in the list above). So use the Reg-ex pattern:
([a-zA-Z] ( [a-zA-Z] ) ),([a-zA-Z] ( [a-zA-Z] ) ),[a-zA-Z]
This approach does work, but still is far too rigid to be practical. For Ex: Consider 4 names (rather than 3) being a part of the value, the said pattern won't work
Is there a more generic reg-ex pattern possible that could satisfy each of the aforementioned patterns?
CodePudding user response:
Here is a generic regex pattern which covers all 5 types of inputs:
^\[.*?\](?:(?:,|\s and\s )\s*\[.*?\](?:\s and\s \[.*?\])*)*$
Demo
Explanation of regex:
^ start of string
\[.*?\] match [Name]
(?:
(?:,|\s and\s ) match either comma or "and" separator
\s* optional whitespace
\[.*?\] another [Name 2]
(?:
\s and\s "and" separator
\[.*?\] more [Name] terms
)* zero or more
)* zero or more
$ end of string
CodePudding user response:
You could write the pattern repeatedly matching all between the square brackets:
^\[[^\]\[]*](?:(?:,| and) \[[^\]\[]*])*$
In parts, the pattern matches:
^
Start of string\[[^\]\[]*]
Match from[....]
(?:
Non capture group(?:,| and)
Match either a comma followed by a space orand
followed by a space\[[^\]\[]*]
Match from[....]
)*
Close the non capture group and optionally repeat$
End of string
In Java with the doubled escaped backslashes:
String regex = "^\\[[^\\]\\[]*](?:(?:,| and) \\[[^\\]\\[]*])*$"