Home > database >  Use Regex to Separate Pairs of Title and URL
Use Regex to Separate Pairs of Title and URL

Time:10-27

I copied a list of books and their URL from a website that becomes one string when pasted in a word doc and I'd like to separate each Title and URL on new lines:

Copied list:

Elementary Algebra enter image description here https://regexr.com/

Also, in python it returns only the first match and I can't figure out how to group it with () and then get all of them with *

CodePudding user response:

Try (regex101):

import re

s = """Elementary Algebra https://amzn.to/3S7yG0Y Pre-Algebra https://amzn.to/3TpW8HK Discrete Mathematical Structures https://amzn.to/3eBYogb Discrete Mathematics and its Applications https://amzn.to/3TvfThe Discrete and Combinatorial Mathematics https://amzn.to/3CELUfO"""

pat = re.compile(r"\s*(.*?)\s (https?://\S )")

print(pat.findall(s))

Prints:

[
    ("Elementary Algebra", "https://amzn.to/3S7yG0Y"),
    ("Pre-Algebra", "https://amzn.to/3TpW8HK"),
    ("Discrete Mathematical Structures", "https://amzn.to/3eBYogb"),
    ("Discrete Mathematics and its Applications", "https://amzn.to/3TvfThe"),
    ("Discrete and Combinatorial Mathematics", "https://amzn.to/3CELUfO"),
]
  • Related