Home > database >  How to extract a substring using a regular expression
How to extract a substring using a regular expression

Time:04-23

There is a text like: ...some text [wrong answer\#correct answer\wrong answer] some text...

I need to figure out how to extract two substrings from the text in square brackets (in different places, i.e. two regular expressions should result):

  1. All wrong answers without \
  2. Correct answers that begin with #, while excluding \ and #

At the same time, in place of correct and wrong answers, there can be a string of any size with any characters except [, ], \, #. It is desirable that the number of possible answers does not affect the expression. For example, there may be several correct and incorrect answers. The order in which the answers stand may also change. Any ideas how to do this using regEx?

CodePudding user response:

To make sure that we have one of the answers we need to check what symbol is before it. To do so we can use Positive Lookbehind like this (?<=Y)X. Idea is something like "find the X if there is a Y before it". Then all we need is just to take all the text not matching special chars like \, #, [ or ] using [^XYZ] that matches everything besides X, Y and Z.

To fix the issue from the comment we also need to check what comes next after the answer. There are 2 options: \ or ]. Now we are going to use Positive Lookahead which is like Lookbehind but checks the text after X. Example X(?=Y) means "find the X if there is a Y after it".

Final patterns are:

  • Wrong answers: (?<=\\|\[)[^\\#\[\]] (?=\\|\])
  • Correct answers: (?<=#)[^\\#\[\]] (?=\\|\])
  • Related