Home > Net >  Regex pattern to match multiple characters and split
Regex pattern to match multiple characters and split

Time:11-11

I haven't used regex much and was having issues trying to split out 3 specific pieces of info in a long list of text I need to parse.

note = "**Jane Greiz** `#1`: Should be open here .\n**Thomas Fitzpatrick** `#90`: Anim: Can we start the movement.\n**Anthony Smith** `#91`: Her left shoulder.\nhttps://google.com"
  1. pattern1 = Parse the **Name Text**
  2. pattern2 = Parse the number `#x`
  3. pattern3 = Grab everything else until the next pattern 1

What I have doesn't seem to work well. There are empty elements? They are not grouped together? And I can't figure out how to grab the last pattern text without it affecting the first 2 patterns. I'd also like it if all 3 matches were in a tuple together rather than separated. Here's what I have so far:

all = r"\*\*(. ?)\*\*|\`#(. ?)\`:"
l = re.findall(all, note)

Output:

[('Jane Greiz', ''), ('', '1'), ('Thomas Fitzpatrick', ''), ('', '90'), ('Anthony Smith', ''), ('', '91')]

CodePudding user response:

Don't use alternatives. Put the name and number patterns after each other in a single alternative, and add another group for the match up to the next **.

note = "**Jane Greiz** `#1`: Should be open here .\n**Thomas Fitzpatrick** `#90`: Anim: Can we start the movement.\n**Anthony Smith** `#91`: Her left shoulder.\nhttps://google.com"
all = r"\*\*(. ?)\*\*.*?\`#(. ?)\`:(.*)"
print(re.findall(all, note))

Output is:

[('Jane Greiz', '1', ' Should be open here .'), ('Thomas Fitzpatrick', '90', ' Anim: Can we start the movement.'), ('Anthony Smith', '91', ' Her left shoulder.')]
  • Related