Everyone! I have a problem with regular expression. I should remove enumerated list from text like this:
"1. Important text"
"2. Important text"
"1.1 Important text"
"1.2 Important text"
"1.1.1 Important text"
"1.1.2 Important text"
All these should be
"Important text"
"Important text"
"Important text"
...
I should do it on python. I tried to use r'\d\.', but I have a last digit at the end.
CodePudding user response:
You may do a replacement on \d (?:\.\d )
:
text = '1 General information 1.1 The purpose of the requirements is the introduction of an automated system for examination, preparation of expert opinions, draft decisions, risk assessment, 1.1.1 The requirements are formulated for the following customer segments of CB'
output = re.sub(r'\s*\d (?:\.\d ) \s*', ' ', text).strip()
print(output)
This prints:
1 General information The purpose of the requirements is the introduction of an automated system for examination, preparation of expert opinions, draft decisions, risk assessment, The requirements are formulated for the following customer segments of CB
CodePudding user response:
try r"(?:\d\.?) "
- (?:) → for unnamed grouping
- \d → digit
- .? → "." maybe exists or not
- → this pattern goes on...