Home > Mobile >  How do i write a RegEx that starts reading from behind?
How do i write a RegEx that starts reading from behind?

Time:04-29

I have a series of words I try to capture.

I have the following problem:

  • The string ends with a fixed set of words
  • It is not clearly defined how many words the string consists of. However, it should capture all words that start with a upper case letter (German language). Therefore, the left anchor should be the first word starting with lower case.

Example (bold is what I try to capture):

  • I like Apple Bananas And Cars.

  • building houses Might Be Salty Hard said Jessica.

This is the RegEx I tried so far, it only works, if the "non-capture" string does not include any upper case words: /(?:[a-zäöü]*)([\p{L} ().&] [Cars|Hard])/gu

CodePudding user response:

You might start the match with an uppercase character allowing German uppercase chars as well, and then optionally repeat matching either words that start with an uppercase character, or a "special character.

Then end the match with an alternation matching either Hard or Cars.

(?<!\S)[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜẞ]*(?:\s (?:[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜẞ]*|[ ()&]))*\s (?:Hard|Cars)\b

Explanation

  • (?<!\S) Assert a whitespace boundary to the left to prevent starting the match after a non whitespace char
  • [A-ZÄÖÜß][a-zA-ZäöüßÄÖÜẞ]* Match a word that starts with an uppercase char
  • (?: Non capture group to match as a whole part
    • \s Match 1 whitespace chars
    • (?: Non capture group
      • [A-ZÄÖÜß][a-zA-ZäöüßÄÖÜẞ]* Match a word that starts with uppercase
      • | Or
      • [ ()&] Match one of the "special" chars
    • ) Close the non capture group
  • )* Close the non capture group and optionally repeat it
  • \s Match 1 whitespace chars
  • (?:Hard|Cars) Match one of the alternatives
  • \b A word boundary to prevent a partial word match

See a regex demo.

CodePudding user response:

Use \p{Lu} for uppercase letters:

(?:[\p{Lu} ()&][\p{L} ()&]* ) (?:Cars|Hard)

See live demo (showing matching umlauted letters and ß).

  • Related