The text is like "1-2years. 3years. 10years."
I want get result [(1,2),(3),(10)]
.
I use python.
I first tried r"([0-9]?)[-]?([0-9])years"
. It works well except for the case of 10. I also tried r"([0-9]?)[-]?([0-9]|10)years"
but the result is still [(1,2),(3),(1,0)]
.
CodePudding user response:
This should work:
import re
st = '1-2years. 3years. 10years.'
result = [tuple(e for e in tup if e)
for tup in re.findall(r'(?:(\d )-(\d )|(\d ))years', st)]
# [('1', '2'), ('3',), ('10',)]
The regex will look for either one number, or two separated by a hyphen, immediately prior to the word years
. If we give this to re.findall()
, it will give us the output [('1', '2', ''), ('', '', '3'), ('', '', '10')]
, so we also use a quick list comprehension to filter out the empty strings.
Alternately we could use r'(\d )(?:-(\d ))?years'
to basically the same effect, which is closer to what you've already tried.
CodePudding user response:
You can use this pattern: (?:(\d )-)?(\d )years
See Regex Demo
Code:
import re
pattern = r"(?:(\d )-)?(\d )years"
text = "1-2years. 3years. 10years."
print([tuple(int(z) for z in x if z) for x in re.findall(pattern, text)])
Output:
[(1, 2), (3,), (10,)]
CodePudding user response:
Your attempt r"([0-9]?)[-]?([0-9])years"
doesn't work for the case of 10
because you ask it to match one (or zero) digit per group.
You also don't need the hyphen in brackets.
This should work: Regex101
(\d )(?:-(\d ))?years
Explanation:
(\d )
: Capturing group for one or more digits(?: )
: Non-capturing group-
: hyphen(\d )
: Capturing group for one or more digits(?: )?
: Make the previous non-capturing group optional
In python:
import re
result = re.findall(r"(\d )(?:-(\d ))?years", "1-2years. 3years. 10years.")
# Gives: [('1', '2'), ('3', ''), ('10', '')]
Each tuple in the list contains two elements: The number on the left side of the hyphen, and the number on the right side of the hyphen. Removing the blank elements is quite easy: you loop over each item
in result
, then you loop over each match
in this item
and only select it (and convert it to int
) if it is not empty.
final_result = [tuple(int(match) for match in item if match) for item in result]
# gives: [(1, 2), (3,), (10,)]
CodePudding user response:
You only match a single digit as the character class [0-9]
is not repeated.
Another option is to match the first digits with an optional part for - and digits.
\b(\d (?:-\d )?)years\.
\b
A word boundary(
Capture group 1 (which will be returned by re.findall)\d (?:-\d )?
Match 1 digits and optionally match-
and again 1 digits
)
Close group 1years\.
Match literally with the escaped.
Then you can split the matches on -
pattern = r"\b(\d (?:-\d )?)years\."
s = "1-2years. 3years. 10years."
res = [tuple(v.split('-')) for v in re.findall(pattern, s)]
print(res)
Output
[('1', '2'), ('3',), ('10',)]
Or if a list of lists is also ok instead of tuples
res = [v.split('-') for v in re.findall(pattern, s)]
Output
[['1', '2'], ['3'], ['10']]