Can you please help me to understand how to do the following?
I'm having a strings (3 formats for this string):
- "Section_1: hello & goodbye | section_2: kuku"
- "Section_1: hello & goodbye & hola | section_2: kuku"
- "Section_1: hello | section_2: kuku"
I want the get the result:
- Group section_1: "hello & goodbye", Group section_2: "kuku"
- Group section_1: "hello & goodbye & hola", Group section_2: "kuku"
- Group section_1: "hello", Group section_2: "kuku"
Now I have the regex (but it's not working for me because of the '&'):
Section_1:\s*(?<section_1>\w )(\s*\|\s*(Section_2:(\s*(?<section_2>.*))?)?)?
Note: the regex is capturing 2 groups- "section_1" and "section_2"
The question is- how can I read sub string the can contains zero or more from " & {word}"
Thanks in advance
CodePudding user response:
As per the comments we established that the ' & '- combination acts as a delimiter between words. There are probably a ton of ways to write a pattern to capture these substrings, but to me these can be grouped into extensive or simple. Depending if you need to validate the input more thoroughly you could use:
^section_1:\s*(?<section_1>[a-z] (?:\s&\s[a-z] )*)\s*\|\s*section_2:\s*(?<section_2>[a-z] (?:\s&\s[a-z] )*)$
See an online demo. The pattern means:
^
- Start-line anchor;section_1:\s*
- Match 'Section_1:' literally followed by 0 whitespace characters;(?<section_1>[a-z] (?:\s &\s[a-z] )*)
- A named capture group to catch[a-z]
as 1 upper/lower letters (case-insensitive flag), followed by a nested non-capture group matching 0 times the pattern(?:\s&\s[a-z] )*
to test for any delimiter as per above followed by another word;\s*\|\s*section_2:\s*
- Match whitespace characters, a literal pipe-symbol and literally 'Section_2:' upto;(?<section_2>[a-z] (?:\s&\s[a-z] )*)
- A 2nd named capture group to match the same pattern as the above named capture group;$
- End-line anchor.
Note: As mentioned, there are a ton of differnt pattern one could use depending on how specific you need to be about validating input. For example: \s*(?<section_1>[^:|] ?)\s*\|\s*[^:]*:\s*(?<section_2>. )
may also work.