Home > Blockchain >  Python regex to match a person's height
Python regex to match a person's height

Time:12-24

I am trying to create a python regex that will match a person's height, in feet and inches, separated by a single apostrophe (such as 6'0 for example). Valid heights are between 4'0 and 6'11 for my purposes: Here's what I have so far:

import re
import requests
url = 'https://rolltide.com/sports/football/roster'
re.findall('''([456][']([02-9]|1[0-1]?))''', (requests.get(url)).text)

This regex returns the following (I will just show the first few matches):

[("6'1", '1'),
 ("6'2", '2'),
 ("6'1", '1'),
 ("6'2", '2'),
 ("6'1", '1'),
 ("6'4", '4'),
 ("6'1", '1'),
 ("6'1", '1'),
 ("6'2", '2'),
 ("6'3", '3'),
 ("6'0", '0'),
 ("6'1", '1'),
 ("6'2", '2'),
 ("6'2", '2'),
 ("6'0", '0'),
 ("6'1", '1'),
 ("6'0", '0'),
 ("5'10", '10'),
 ...
 ]

I would like for the regex to return the following instead:

["6'1",
 "6'2",
 "6'1",
 "6'2",
 "6'1",
 "6'4",
 "6'1",
 "6'1",
 "6'2",
 "6'3",
 "6'0",
 "6'1",
 "6'2",
 "6'2",
 "6'0",
 "6'1",
 "6'0",
 "5'10",
 ... 
 ]

I am really not sure what the issue is. I am new to regex but I think it has to do with the parenthesis usage.

CodePudding user response:

Just use the pattern [4-6]'(?:[0-9]|1[0-1])":

import re
import requests
url = 'https://rolltide.com/sports/football/roster'
re.findall('[4-6]'(?:[0-9]|1[0-1])"', (requests.get(url)).text)

This regex pattern says to match:

[4-6]'      4-6 feet
(?:
    [0-9]   0-9 inches
    |       OR
    1[0-1]  10-11 inches
)"

Here is a demo showing that the regex is working.

CodePudding user response:

The problem is you are creating two groups, one for matching the inches alone and another for matching the whole height. That's why you are getting height and inches seperately.

  • Related