I have some strings containing Unicode characters like bellow:
رده سنی مجاز :
10.2-15.3
8.71-9.13
25.08 - 31.2
زده های سنی غیرمجاز:
16.5-18.4
9.15 - 10.02
20.02-21.30
I want to match the first number ranges like bellow:
10.2-15.3
8.71-9.13
25.08-31.2
and I'm using the following code:
print(re.findall('رده سنی مجاز :.*(.*\d .\d -\d .\d .*)', string, re.DOTALL))
but it returns:
['25.08-31.2']
CodePudding user response:
I suggest extracting all strings after the fixed text till a blank line, and then split the extracted part into separate lines:
import re
p = r"رده سنی مجاز :\s*\n(. (?:\n. )*)"
text = "رده سنی مجاز : \n 10.2-15.3\n 8.71-9.13\n 25.08 - 31.2\n\nزده های سنی غیرمجاز:\n 16.5-18.4\n 9.15 - 10.02\n 20.02-21.30"
m = re.search(p, text)
if m:
print([x.strip() for x in m.group(1).splitlines()])
# => ['10.2-15.3', '8.71-9.13', '25.08 - 31.2']
See the Python demo and the regex demo.
Details:
رده سنی مجاز :
- a fixed string\s*
- zero or more whitespaces\n
- a newline(. (?:\n. )*)
- one or more non-empty lines captured into Group 1.