I have the following string:
[Example] öäüß asdf 1234 (1aö) (not necessary),
Explanation:
[Example]
optional, not needed
öäüß asdf 1234
the most important part which i need. Every character, number, special character as well as german characters like äÄöÖüÜß
can be found here. A greedy selection might be the best solution to prevent characters like the german ones, right?
(1aö)
optional and needed
(not necessary)
optional, not needed. If it appears it coult be (not ...)
or (unusual)
,
the komma can be optional, too. But is also not needed.
I use the following RegEx: /(?:\[.*\]\s)?(?<name>.*?)(?:\s\([not|unusual].*?\))?\,/g
The problems:
when i use the optional parameter
?
at the komma it splits the whole string into seperate characters.when i change the non greedy selection in the
name
group to a greedy one the optional komma is seperated. But now the example string starting withö
is selected up to the end.the string inside of the first standard brackets
()
can start with upper or lower case. At this moment i can only recognise upper case.
Here's my attempt at regex101 with a bunch of examples: https://regex101.com/r/Lx2anw/1
Sorry for the quite specific question, but i'm at the end with my knowledge ...
Does anyone have suggestions what i can do here?
CodePudding user response:
It will work if you put every expression that you don't want to match as a single non-captured group. Your expression will be like this:
/(?:\[.*\]\s)?(?<name>. ?)(?:\s\(not \w \))?(?:\s\(unusual\))?,?$/gm
https://regex101.com/r/a7Qtvw/1
CodePudding user response:
You can use
^(?:\[.*?]\s)?(?<name>.*?)(?:\s\((?:not|unusual)[^()]*\))?,?\s*$
See the regex demo.
Details:
^
- start of string(?:\[.*?]\s)?
- an optional sequence of[...]
and a whitespace(?<name>.*?)
- Group "name": any zero or more chars as few as posible(?:\s\((?:not|unusual)[^()]*\))?
- an optional sequence of a whitespace,(
,not
orunusual
, and then zero or more chars other than(
and)
and then a)
char,?
- an optional comma\s*
- zero or more whitespaces$
- end of string
CodePudding user response:
Your pattern matches the rest of the line in group 1 because all that follows in the pattern after group name is optional.
Note that you use a character class [not|unusual]
but you should use a grouping if you want to match one of the alternatives like (?:not|unusual)
You might also match any character except parenthesis, or a comma that is at the end of the string.
Then match an optional part between parenthesis.
^(?:\[[^\][\n]*\]\s)?(?<name>(?:(?!,\s*$)[^\n()])*(?:\([^()\n]*\))?)
Explanation
^
Start of string(?:\[[^\][\n]*\]\s)?
Optionally match[...]
(?<name>
Group name(?:
Non capture group(?!,\s*$)[^\n()]
If we are not looking at a trailing comma, match any character except(
)
or a newline
)
Close the non capture group and repeat 1 or more times to not match an empty line(?:\([^()\n]*\))?
Optionally match a part from(...)
)
Close group name
If the first part between parenthesis should not start with the words not or unusual you can assert for it using a negative lookahead (?!not\b|unusual\b)
^(?:\[[^\][\n]*\]\s)?(?<name>(?:(?!,\s*$)[^\n()]) (?:\((?!not\b|unusual\b)[^()\n]*\))?)