Ignore an optional word if present in a string - regular expression in python-CodePudding

I'm trying to match a string with regular expression using Python, but ignore an optional word if it's present.

For example, I have the following lines:

First string
Second string [Ignore This Part]
Third string (1) [Ignore This Part]

I'm looking to capture everything before [Ignore This Part]. Notice I also want to exclude the whitespace before [Ignore This Part]. Therefore my results should look like this:

First string
Second string
Third string (1)

I have tried the following regular expression with no luck, because it still captures [Ignore This Part]:

. (?:\s\[. \])?

Any assistance would be appreciated.

I'm using python 3.8 on Window 10.

Edit: The examples are meant to be processed one line at a time.

CodePudding user response：

Use [^[] instead of . so it doesn't match anything with square brackets and doesn't match across newlines.

^[^[\n] (?\s\[. \])?

DEMO

CodePudding user response：

With your shown samples, please try following code, written and tested in Python3.

import re
var="""First string
Second string [Ignore This Part]
Third string (1) [Ignore This Part]"""

[x for x in list(map(lambda x:x.strip(),re.split(r'(?m)(.*?)(?:$|\s\[[^]]*\])',var))) if x]

Output will be as follows, in form of list which could be accessed as per requirement.

['First string', 'Second string', 'Third string (1)']

Here is the complete detailed explanation for above Python3 code:

Firstly using re module's split function where passing regex (.*?)(?:$|\s\[[^]]*\]) with multiline reading flag enabled. This is complete function of split: re.split(r'(?m)(.*?)(?:$|\s\[[^]]*\])',var)
Then passing its output to a lambda function to use strip function to remove elements which are having new lines in it.
Applying map to it and creating list from it.
Then simply removing NULL items from list to get only required part as per OP.

CodePudding user response：

Perhaps you can remove the part that you don't want to match:

[^\S\n]*\[[^][\n]*]$

Explanation

[^\S\n]* Match optional spaces
\[[^][\n]*] Match from [....]
$ End of string

Regex demo

Example

import re

pattern = r"[^\S\n]*\[[^][\n]*]$"

s = ("First string\n"
            "Second string [Ignore This Part]\n"
            "Third string (1) [Ignore This Part]")

result = re.sub(pattern, "", s, 0, re.M)

if result:
    print(result)

Output

First string
Second string
Third string (1)

If you don't want to be left with an empty string, you can assert a non whitespace char to the left:

(?<=\S)[^\S\n]*\[[^][\n]*]$

Regex demo

CodePudding user response：

You may use this regex:

^. ?(?=$|\s*\[[^]]*]$)

RegEx Demo

If you want better performing regex then I suggest:

^\S (?:\s \S )*?(?=$|\s*\[[^]]*]$)

RegEx Demo 2

RegEx Details:

^: Start
. ?: Match 1 of any characters (lazy match)
(?=: Start lookahead
- $: End
- |: OR
- \s*: Match 0 or more whitespaces
- \[[^]]*]: Match [...] text
- $: End
): Close lookahead