The string is stored in a variable text. When I do print(text) I get the output:
SHIP TO
Flensburg House, MMDA Colony,
Arumbakkam,Chennai, Tamil Nadu,
I need to get the text:
Flensburg House, MMDA Colony,
Arumbakkam,Chennai, Tamil Nadu,
Here's what I have tried:
shipto=[]
shipto_re=re.compile(r"SHIP TO((?:.*\n){1,3})")
for line in text.split():
if shipto_re.match(line):
shipto.append(line)
However, this isn't giving me a match,I know the regex works , so the problem definitely lies in how to iterate through the text variable.
CodePudding user response:
You are using a regex that matches across lines, but you split the string with whitespace and test each split "token" against the regex.
You need to use
import re
text = r'''SHIP TO
Flensburg House, MMDA Colony,
Arumbakkam,Chennai, Tamil Nadu,
'''
shipto_re=re.compile(r"SHIP TO((?:.*\n){1,3})")
shipto = [x.strip() for x in shipto_re.findall(text)]
print(shipto)
# => ['Flensburg House, MMDA Colony,\nArumbakkam,Chennai, Tamil Nadu,']
See the Python demo.
Here, Pattern.findall
is used to extract Group 1 value from the matches, and each match is stripped off any leading and trailing whitespace with str.strip()
.
More considerations
If you plan to match a line even if it is at the end of a string, you need to replace the regex with
shipto_re=re.compile(r"SHIP TO(.*(?:\n.*){0,2})")
The SHIP TO(.*(?:\n.*){0,2})
matches SHIP TO
and then captures into Group 1 any text till end of the current line, then zero, one or two sequences of a newline (LF) char and then the rest of that line (with (.*(?:\n.*){0,2})
).
CodePudding user response:
Here you go... sample code ->
import re
regex = r"SHIP TO(.*)"
test_str = ("SHIP TO\n"
"Flensburg House, MMDA Colony,\n"
"Arumbakkam,Chennai, Tamil Nadu,")
matches = re.finditer(regex, test_str, re.DOTALL)
for matchNum, match in enumerate(matches, start=1):
for groupNum in range(0, len(match.groups())):
groupNum = groupNum 1
lines = match.group(groupNum).strip().split("\n")
print(lines)
The thing is you have to use re.DOTALL
flag
CodePudding user response:
You regex is correct.
I believe your issue is the use of text.split() which by default splits on any whitespace meaning it trys to match per word.
Instead simply use findall.
import re
text="""SHIP TO
Flensburg House, MMDA Colony,
Arumbakkam,Chennai, Tamil Nadu,
"""
shipto=[]
shipto_re=re.compile(r"SHIP TO((?:.*\n){1,3})")
shipto = shipto_re.findall(text)
print (shipto)