Home > Enterprise >  How to create a list of tuples containing multiple Regular Expression
How to create a list of tuples containing multiple Regular Expression

Time:10-19

So I am currently working on an assignment requiring us to extract phone numbers, emails, and websites from a text document. The lecturer required us to output it into a list of tuples, each of them contains the initial index, the length, and the match. Here are some examples: [(1,10,'0909900008'), (35,16,'[email protected]')], ... Since there are three different requirements to achieve. How can I put all of them into a list of tuples? I have thought of the three regex expressions, but I can't really put all of them together in 1 list. Should I create a new expression to describe all three? Thanks for your help.

result = []

# Match with RE
email_pattern = r'[\w\.-] @[\w\.-] (?:\.[\w] ) '
email = re.findall(email_pattern, string)
for match in re.finditer(email_pattern, string):
    print(match.start(), match.end() - match.start(), match.group())

phone_pattern = r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'
phone = re.findall(phone_pattern, string)
for match in re.finditer(phone_pattern, string):
    print(match.start(), match.end() - match.start(), match.group())

website_pattern = '(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-] [a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-] [a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9] \.[^\s]{2,}|www\.[a-zA-Z0-9] \.[^\s]{2,})'
web = re.findall(website_pattern, string)
for match in re.finditer(website_pattern, string):
    print(match.start(), match.end() - match.start(), match.group())

My output:

# Text document
should we use regex more often? let me know at [email protected] or [email protected]. To further notice, contact Khoi at 0957507468 or accessing
https://web.de or maybe www.google.com, or Mr.Q at 0912299922.

# Output
47 21 [email protected]
72 13 [email protected]
122 10 0957507468
197 10 0912299922
146 14 https://web.de
170 15 www.google.com,

CodePudding user response:

Rather than printing do appending to result list then print it, i.e. change

print(match.start(), match.end() - match.start(), match.group())

to

result.append((match.start(), match.end() - match.start(), match.group()))

and same way for others, then at end

print(result)
  • Related