Suppose that we have the following string:
'We need the list fo the following products: - Abcd efgh ejklm - Efgh-ij sklrm, defasad - KLMNNOP/QRS dasfdssa eadsd'
I want a regex that return:
- Abcd efgh, ejklm
- Efgh-ij sklrm, defasad
- KLMNNOP/QRS dasfdssa eadsd
I write this one that works correctly but it cuts if we have a composed word.
import re
regx = '-\s[\w\s\/?,;!:#&@]*' # start with hyphen space mix of different characters
z = re.findall(regx, 'We need the list fo the following products: - Abcd - Efgh-ij - KLMNNOP/QRS')
for p in z:
print(p)
- Abcd efgh, ejklm
- Efgh
- KLMNNOP/QRS dasfdssa eadsd
CodePudding user response:
You could repeat matching either the current character class, or only a hyphen followed by word characters
-\s(?:[\w\s/?,;!:#&@] |-\w )
See a regex demo and a Python demo.
If you don't want to match empty parts, you can change the quantifier for the character class to
to match 1 or more times.
Example
import re
regx = '-\s(?:[\w\s/?,;!:#&@] |-\w ) '
z = re.findall(regx, 'We need the list fo the following products: - Abcd efgh ejklm - Efgh-ij sklrm, defasad - KLMNNOP/QRS dasfdssa eadsd')
for p in z:
print(p)
Output
- Abcd efgh ejklm
- Efgh-ij sklrm, defasad
- KLMNNOP/QRS dasfdssa eadsd
Or a bit broader match instead of only word characters:
-\s(?:[\w\s/?,;!:#&@] |-[\w/?,;!:#&@] )