Home > Enterprise >  Python Get text between keywords
Python Get text between keywords

Time:07-07

I would like to get the text between certain keywords [en] and [ja]

So for the following example:

[en]
Text
- Example
- Example
- Example

[ja]
Text
 - 例
 - 例
 - 例

I need it to return only:

Text
- Example
- Example
- Example

I have tried using regex:

([en])(.|\n) ?([ja])

But it only grabs the first 2 characters of first line. What am I doing wrong here?

CodePudding user response:

Captures all the text between [en] and [ja]

(?<=\[en\]\n)(?:(?:.*\n) ?)(?=\n\[ja\])

Regex working link

CodePudding user response:

You may use this regex for capturing text between [en] and [ja]:

\[en]\n((?:.*\n)*?)\n\[ja]

RegEx Demo

RegEx Details:

  • \[en]\n: Match [en] followed by a line break
  • ((?:.*\n) ?): Match anything followed by a line break. Repeat this group 1 times (lazy matching) and capture matched text in group #1
  • \n\[ja]: Match line break followed by [ja]
  • Related