I'm trying to match a string with regular expression using Python, but ignore an optional word if it's present.
For example, I have the following lines:
First string
Second string [Ignore This Part]
Third string (1) [Ignore This Part]
I'm looking to capture everything before [Ignore This Part]
. Notice I also want to exclude the whitespace before [Ignore This Part]
. Therefore my results should look like this:
First string
Second string
Third string (1)
I have tried the following regular expression with no luck, because it still captures [Ignore This Part]
:
. (?:\s\[. \])?
Any assistance would be appreciated.
I'm using python 3.8 on Window 10.
Edit: The examples are meant to be processed one line at a time.
CodePudding user response:
Use [^[]
instead of .
so it doesn't match anything with square brackets and doesn't match across newlines.
^[^[\n] (?\s\[. \])?
CodePudding user response:
With your shown samples, please try following code, written and tested in Python3.
import re
var="""First string
Second string [Ignore This Part]
Third string (1) [Ignore This Part]"""
[x for x in list(map(lambda x:x.strip(),re.split(r'(?m)(.*?)(?:$|\s\[[^]]*\])',var))) if x]
Output will be as follows, in form of list which could be accessed as per requirement.
['First string', 'Second string', 'Third string (1)']
Here is the complete detailed explanation for above Python3 code:
- Firstly using
re
module'ssplit
function where passing regex(.*?)(?:$|\s\[[^]]*\])
with multiline reading flag enabled. This is complete function ofsplit
:re.split(r'(?m)(.*?)(?:$|\s\[[^]]*\])',var)
- Then passing its output to a
lambda
function to usestrip
function to remove elements which are having new lines in it. - Applying
map
to it and creatinglist
from it. - Then simply removing NULL items from list to get only required part as per OP.
CodePudding user response:
Perhaps you can remove the part that you don't want to match:
[^\S\n]*\[[^][\n]*]$
Explanation
[^\S\n]*
Match optional spaces\[[^][\n]*]
Match from[....]
$
End of string
Example
import re
pattern = r"[^\S\n]*\[[^][\n]*]$"
s = ("First string\n"
"Second string [Ignore This Part]\n"
"Third string (1) [Ignore This Part]")
result = re.sub(pattern, "", s, 0, re.M)
if result:
print(result)
Output
First string
Second string
Third string (1)
If you don't want to be left with an empty string, you can assert a non whitespace char to the left:
(?<=\S)[^\S\n]*\[[^][\n]*]$
CodePudding user response:
You may use this regex:
^. ?(?=$|\s*\[[^]]*]$)
If you want better performing regex then I suggest:
^\S (?:\s \S )*?(?=$|\s*\[[^]]*]$)
RegEx Details:
^
: Start. ?
: Match 1 of any characters (lazy match)(?=
: Start lookahead$
: End|
: OR\s*
: Match 0 or more whitespaces\[[^]]*]
: Match[...]
text$
: End
)
: Close lookahead