I have a string as:
s=
"(2021-06-29T10:53:42.647Z) [Denis]: hi
(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane
(2021-06-29T11:58:29.053Z) [Nicholas]:
(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##"
I want to extract the text from it. Expected output as:
comments=['hi','TA FOR SHOWING','how are you bane',' ','#END_REMOTE#','VAL 01JUL2021','##ENDED AT 08:07 GMT##']
What I have tried is:
comments=re.findall(r']:\s (.*?)\n',s)
regex works well but I'm not able to get the blank text as ''
CodePudding user response:
Is this what you want?
comments = re.findall(r']:\s(.*?)\n',s)
If the space after :
is always one space, \s
should be \s
. \s
means one or more spaces.
CodePudding user response:
You can exclude matching the ]
instead in the capture group, and if you also want to match the value on the last line, you can assert the end of the string $
instead of matching a mandatory newline with \n
Note that \s
can match a newline and also the negated character class [^]]*
can match a newline
]:\s ([^]]*)$
import re
regex = r"]:\s ([^]]*)$"
s = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
"(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
"(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
"(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
"(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
"(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
"(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")
print(re.findall(regex, s, re.MULTILINE))
Output
['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']
If you don't want to cross lines:
]:[^\S\n] ([^]\n]*)$