Home > OS >  Ignore an optional word if present in a string - regular expression in python
Ignore an optional word if present in a string - regular expression in python

Time:07-27

I'm trying to match a string with regular expression using Python, but ignore an optional word if it's present.

For example, I have the following lines:

First string
Second string [Ignore This Part]
Third string (1) [Ignore This Part]

I'm looking to capture everything before [Ignore This Part]. Notice I also want to exclude the whitespace before [Ignore This Part]. Therefore my results should look like this:

First string
Second string
Third string (1)

I have tried the following regular expression with no luck, because it still captures [Ignore This Part]:

. (?:\s\[. \])?

Any assistance would be appreciated.

I'm using python 3.8 on Window 10.

Edit: The examples are meant to be processed one line at a time.

CodePudding user response:

Use [^[] instead of . so it doesn't match anything with square brackets and doesn't match across newlines.

^[^[\n] (?\s\[. \])?

DEMO

CodePudding user response:

With your shown samples, please try following code, written and tested in Python3.

import re
var="""First string
Second string [Ignore This Part]
Third string (1) [Ignore This Part]"""

[x for x in list(map(lambda x:x.strip(),re.split(r'(?m)(.*?)(?:$|\s\[[^]]*\])',var))) if x]

Output will be as follows, in form of list which could be accessed as per requirement.

['First string', 'Second string', 'Third string (1)']

Here is the complete detailed explanation for above Python3 code:

  • Firstly using re module's split function where passing regex (.*?)(?:$|\s\[[^]]*\]) with multiline reading flag enabled. This is complete function of split: re.split(r'(?m)(.*?)(?:$|\s\[[^]]*\])',var)
  • Then passing its output to a lambda function to use strip function to remove elements which are having new lines in it.
  • Applying map to it and creating list from it.
  • Then simply removing NULL items from list to get only required part as per OP.

CodePudding user response:

Perhaps you can remove the part that you don't want to match:

[^\S\n]*\[[^][\n]*]$

Explanation

  • [^\S\n]* Match optional spaces
  • \[[^][\n]*] Match from [....]
  • $ End of string

Regex demo

Example

import re

pattern = r"[^\S\n]*\[[^][\n]*]$"

s = ("First string\n"
            "Second string [Ignore This Part]\n"
            "Third string (1) [Ignore This Part]")

result = re.sub(pattern, "", s, 0, re.M)

if result:
    print(result)

Output

First string
Second string
Third string (1)

If you don't want to be left with an empty string, you can assert a non whitespace char to the left:

(?<=\S)[^\S\n]*\[[^][\n]*]$

Regex demo

CodePudding user response:

You may use this regex:

^. ?(?=$|\s*\[[^]]*]$)

RegEx Demo

If you want better performing regex then I suggest:

^\S (?:\s \S )*?(?=$|\s*\[[^]]*]$)

RegEx Demo 2

RegEx Details:

  • ^: Start
  • . ?: Match 1 of any characters (lazy match)
  • (?=: Start lookahead
    • $: End
    • |: OR
    • \s*: Match 0 or more whitespaces
    • \[[^]]*]: Match [...] text
    • $: End
  • ): Close lookahead
  • Related