Home > Software design >  regex for blank string
regex for blank string

Time:11-09

I have a string as:

s=

"(2021-06-29T10:53:42.647Z) [Denis]: hi
(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane 
(2021-06-29T11:58:29.053Z) [Nicholas]: 
(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##"

I want to extract the text from it. Expected output as:

comments=['hi','TA FOR SHOWING','how are you bane',' ','#END_REMOTE#','VAL 01JUL2021','##ENDED AT 08:07 GMT##'] 

What I have tried is:

comments=re.findall(r']:\s (.*?)\n',s) 

regex works well but I'm not able to get the blank text as ''

CodePudding user response:

Is this what you want?

comments = re.findall(r']:\s(.*?)\n',s)

If the space after : is always one space, \s should be \s. \s means one or more spaces.

CodePudding user response:

You can exclude matching the ] instead in the capture group, and if you also want to match the value on the last line, you can assert the end of the string $ instead of matching a mandatory newline with \n

Note that \s can match a newline and also the negated character class [^]]* can match a newline

]:\s ([^]]*)$

Regex demo | Python demo

import re

regex = r"]:\s ([^]]*)$"

s = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
    "(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")

print(re.findall(regex, s, re.MULTILINE))

Output

['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##'] 

If you don't want to cross lines:

]:[^\S\n] ([^]\n]*)$

Regex demo

  • Related