Home > Mobile >  divide sentence into words using regex
divide sentence into words using regex

Time:02-16

i want to devide a sentence into words using regex, i'm using this code:

import re
sentence='<30>Jan 11 11:45:50 test-tt systemd[1]: tester-test.service: activation successfully.'
sentence = re.split('\s|,|>|<|\[|\]:', sentence)

but i'm getting not what i'm waiting for

expected output is :

['30', 'Jan', '11', '11:45:50', 'test-tt', 'systemd', '1', 'tester-test.service: activation successfully.']

but what i'm getting is :

['', '30', 'Jan', '11', '11:45:50', 'test-tt', 'systemd', '1', '', 'tester-test.service:', 'activation', 'successfully.']

i tried actually to ingnore the whitespace but actually it should be ignored only in the last long-word and i have no idea how can i do that.. any suggestions/help Thank you in advance

CodePudding user response:

You can use

import re
sentence='<30>Jan 11 11:45:50 test-tt systemd[1]: tester-test.service: activation successfully.'
chunks = sentence.split(': ', 1)
result = re.findall(r'[^][\s,<>] ', chunks[0])
result.append(chunks[1])
print(result)
# => ['30', 'Jan', '11', '11:45:50', 'test-tt', 'systemd', '1', 'tester-test.service: activation successfully.']

See the Python demo

Here,

  • chunks = sentence.split(': ', 1) - splits the sentence into two chunks with the first : substring
  • result = re.findall(r'[^][\s,<>] ', chunks[0]) - extracts all substrings consisting of one or more chars other than ], [, whitespace, ,, < and > chars from the first chunk
  • result.append(chunks[1]) - append the second chunk to the result list.
  • Related