Home > other >  Python regular expression split by multiple delimiters
Python regular expression split by multiple delimiters

Time:11-22

Given the sentence "I want to eat fish and I want to buy a car. Therefore, I have to make money."

I want to split the sentene by

['I want to eat fish', 'I want to buy a car", Therefore, 'I have to make money']

I am trying to split the sentence

re.split('.|and', sentence)

However, it splits the sentence by '.', 'a', 'n', and 'd'.

How can I split the sentence by '.' and 'and'?

CodePudding user response:

In addition to escaping the dot (.), which matches any non-newline character in regex, you should also match any leading or trailing spaces in order for the delimiter of the split to consume undesired leading and trailing spaces from the results. Use a positive lookahead pattern to assert a following non-whitespace character in the end to avoid splitting by the trailing dot:

re.split('\s*(?:\.|and)\s*(?=\S)', sentence)

This returns:

['I want to eat fish', 'I want to buy a car', 'Therefore, I have to make money.']

Demo: https://replit.com/@blhsing/LimitedVastCookies

CodePudding user response:

You need to escape the . in the regex.

import re

s = "I want to eat fish and I want to buy a car. Therefore, I have to make money."

re.split('\.|and', s)

Result:

['I want to eat fish ',
 ' I want to buy a car',
 ' Therefore, I have to make money',
 '']
  • Related