Home > database >  RegExp: match everything till next occurrence
RegExp: match everything till next occurrence

Time:12-20

I'm trying to split lyrics into sections with name = group(1) and lyrics = group(2) using RegExp:

#Chorus              <-- section name [group(1)]
This is the chorus   <-- section lyrics [group(2)]
Got no words for it  <-- [group(2)]

#Verse               <-- next section name
This is the verse    <-- next section lyrics

I've managed to split the first occurrence of group(1) from the rest but it matches all the other occurrences to group(2).

List<SongSection> _sections = [];

RegExp regExp = RegExp(r'\n#([a-zA-Z0-9] )\n((.|\n)*)', multiLine: true);
List<RegExpMatch> matches = regExp.allMatches('\n'   lyrics).toList();

for (RegExpMatch match in matches) {
  _sections.add(
    SongSection(
      name: match.group(1)!,
      lyrics: match.group(2)!.trim(),
    ),
 );
}

print(_sections.toString())

OUTPUT:
[SongSection(name: 'Chorus', lyrics: 'This is the chorus\nGot no words for it\n\n#Verse\nThis is the verse')]

How can I always match everything up to the next occurrence of group(1)?

CodePudding user response:

You could match the # and the allowed characters for the name, and for the lyrics capture all lines that do not start with the name pattern.

As the leading newline is part of group 2, you can remove that from the group 2 value afterwards.

#([a-zA-Z0-9] )((?:\n(?!#[a-zA-Z0-9] $).*)*)
  • #([a-zA-Z0-9] ) Match # and capture 1 or more of the listed characters in group 1
  • ( Capture group 2
    • (?: Non capture group
      • \n Match a newline
      • (?!#[a-zA-Z0-9] $) Assert not # and 1 or more listed chars to the right - .*` Match the whole line
    • )* Close the non capture group and optionally repeat it
  • ) Close group 2

Regex demo

  • Related