I understand why this error occurs but thought I had covered my bases in the function.
This function searches a folder structure and outputs the matching line, line before, and line after ...if they exist. On most terms, it works, but on some it produces the index error.
def pattern_search(x,pattern):
fname = x['Search File']
file = os.path.join(DATA,fname)
match = ""
if os.path.exists(file):
match = extract_match(file,pattern)
else:
match = "File NOT FOUND"
return match
def extract_match(file,pattern):
contents = open(file, encoding="ISO-8859-1").read()
if re.search(pattern, contents):
lines = contents.splitlines()
match = ""
i = 0
for index, line in enumerate(lines):
if i < 1:
if re.search(pattern, line):
i = 1
line = f"MATCH: ({str(index)}) {line}"
if lines[index - 1]:
line = f"PREV: {lines[index - 1]}" "\n" line
if lines[index 1]:
line = "\n" f"POST: {lines[index 1]}"
match = line
else:
pass
else:
match = "NF"
#print(match)
return match
Run as follows:
df["term1"] = df.apply(pattern_search, args=[term1_pat], axis=1)
For most terms, it will return the matching line with context:
PREV: I like cake
MATCH: This is a cake related matching sentence with cake term: batter
POST: mix 3 cups of regex with butter and add cream cheese.
I assume this is with files with few lines or maybe the match occurs and the very end or beginning. How should I account for these conditions?
CodePudding user response:
This happens due to the lines where you check if lines with specific index are falsey.
You need to make sure the index itself is above zero when you decrement, or check if the current index is not equal to the line count when you increment.
Replace
if lines[index - 1]:
if lines[index 1]:
With
if index > 0 and lines[index - 1]:
if index < len(lines) - 1 and lines[index 1]: