Home > Software design >  Python Split Regex not split what I need
Python Split Regex not split what I need

Time:03-21

I have this in my file

import re

sample = """Name: @s
Owner: @a[tag=Admin]"""

target = r"@[sae](\[[\w{}=, ]*\])?"
regex = re.split(target, sample)

print(regex)

I want to split all words that start with @, so like this:
["Name: ", "@s", "\nOwner: ", "@a[tag=Admin]"]

But instead it give this:
['Name: ', None, '\nOwner: ', '[tag=Admin]', '']

How to seperating it?

CodePudding user response:

I would use re.findall here:

sample = """Name: @s
Owner: @a[tag=Admin]"""
parts = re.findall(r'@\w (?:\[.*?\])?|\s*\S \s*', sample)
print(parts)  # ['Name: ', '@s', '\nOwner: ', '@a[tag=Admin]']

The regex pattern used here says to match:

@\w           a tag @some_tag
(?:\[.*?\])?  followed by an optional [...] term
|             OR
\s*\S \s*     any other non whitespace term,
              including optional whitespace on both sides

CodePudding user response:

If I understand the requirements correctly you could do that as follows:

import re
s = """Name: @s
Owner: @a[tag=Admin]
"""
rgx = r'(?=@.*)|(?=\r?\n[^@\r\n]*)'
re.split(rgx, s)
  #=> ['Name: ', '@s', '\nOwner: ', '@a[tag=Admin]\n']

Demo

The regular expression can be broken down as follows.

(?=         # begin a positive lookahead
  @.*       # match '@' followed by >= 0 chars other than line terminators
)           # end positive lookahead
|           # or
(?=         # begin a positive lookahead
  \r?\n     # match a line terminator
  [^@\r\n]* # match >= 0 characters other than '@' and line terminators 
)           # end positive lookahead

Notice that matches are zero-width.

  • Related