Home > Enterprise >  Regex to split text with hyphen points
Regex to split text with hyphen points

Time:02-12

Suppose that we have the following string:

'We need the list fo the following products: - Abcd efgh ejklm - Efgh-ij sklrm, defasad - KLMNNOP/QRS dasfdssa eadsd'

I want a regex that return:

- Abcd efgh, ejklm
- Efgh-ij sklrm, defasad
- KLMNNOP/QRS dasfdssa eadsd

I write this one that works correctly but it cuts if we have a composed word.

import re
regx = '-\s[\w\s\/?,;!:#&@]*' # start with hyphen   space   mix of different characters
z = re.findall(regx, 'We need the list fo the following products: - Abcd - Efgh-ij - KLMNNOP/QRS')
for p in z:
    print(p)

- Abcd efgh, ejklm 
- Efgh
- KLMNNOP/QRS dasfdssa eadsd

CodePudding user response:

You could repeat matching either the current character class, or only a hyphen followed by word characters

-\s(?:[\w\s/?,;!:#&@] |-\w ) 

See a regex demo and a Python demo.

If you don't want to match empty parts, you can change the quantifier for the character class to to match 1 or more times.

Example

import re
regx = '-\s(?:[\w\s/?,;!:#&@] |-\w ) '
z = re.findall(regx, 'We need the list fo the following products: - Abcd efgh ejklm - Efgh-ij sklrm, defasad - KLMNNOP/QRS dasfdssa eadsd')
for p in z:
    print(p)

Output

- Abcd efgh ejklm 
- Efgh-ij sklrm, defasad 
- KLMNNOP/QRS dasfdssa eadsd

Or a bit broader match instead of only word characters:

-\s(?:[\w\s/?,;!:#&@] |-[\w/?,;!:#&@] ) 
  • Related