I copied a list of books and their URL from a website that becomes one string when pasted in a word doc and I'd like to separate each Title and URL on new lines:
Copied list:
Elementary Algebra https://regexr.com/
Also, in python it returns only the first match and I can't figure out how to group it with () and then get all of them with *
CodePudding user response:
Try (regex101):
import re s = """Elementary Algebra https://amzn.to/3S7yG0Y Pre-Algebra https://amzn.to/3TpW8HK Discrete Mathematical Structures https://amzn.to/3eBYogb Discrete Mathematics and its Applications https://amzn.to/3TvfThe Discrete and Combinatorial Mathematics https://amzn.to/3CELUfO""" pat = re.compile(r"\s*(.*?)\s (https?://\S )") print(pat.findall(s))
Prints:
[ ("Elementary Algebra", "https://amzn.to/3S7yG0Y"), ("Pre-Algebra", "https://amzn.to/3TpW8HK"), ("Discrete Mathematical Structures", "https://amzn.to/3eBYogb"), ("Discrete Mathematics and its Applications", "https://amzn.to/3TvfThe"), ("Discrete and Combinatorial Mathematics", "https://amzn.to/3CELUfO"), ]