I am trying to create a python regex that will match a person's height, in feet and inches, separated by a single apostrophe (such as 6'0 for example). Valid heights are between 4'0 and 6'11 for my purposes: Here's what I have so far:
import re
import requests
url = 'https://rolltide.com/sports/football/roster'
re.findall('''([456][']([02-9]|1[0-1]?))''', (requests.get(url)).text)
This regex returns the following (I will just show the first few matches):
[("6'1", '1'),
("6'2", '2'),
("6'1", '1'),
("6'2", '2'),
("6'1", '1'),
("6'4", '4'),
("6'1", '1'),
("6'1", '1'),
("6'2", '2'),
("6'3", '3'),
("6'0", '0'),
("6'1", '1'),
("6'2", '2'),
("6'2", '2'),
("6'0", '0'),
("6'1", '1'),
("6'0", '0'),
("5'10", '10'),
...
]
I would like for the regex to return the following instead:
["6'1",
"6'2",
"6'1",
"6'2",
"6'1",
"6'4",
"6'1",
"6'1",
"6'2",
"6'3",
"6'0",
"6'1",
"6'2",
"6'2",
"6'0",
"6'1",
"6'0",
"5'10",
...
]
I am really not sure what the issue is. I am new to regex but I think it has to do with the parenthesis usage.
CodePudding user response:
Just use the pattern [4-6]'(?:[0-9]|1[0-1])"
:
import re
import requests
url = 'https://rolltide.com/sports/football/roster'
re.findall('[4-6]'(?:[0-9]|1[0-1])"', (requests.get(url)).text)
This regex pattern says to match:
[4-6]' 4-6 feet
(?:
[0-9] 0-9 inches
| OR
1[0-1] 10-11 inches
)"
Here is a demo showing that the regex is working.
CodePudding user response:
The problem is you are creating two groups, one for matching the inches alone and another for matching the whole height. That's why you are getting height and inches seperately.