I have a function that is designed to recursively look for values in an array of objects and return a string of all variables with a similar y0. This all works fine, however, when I manipulate the array it manipulates the array that has been inputted into it, despite the fact that I make a copy of the array to prevent this issue.
That means that when you run the code given, it changes tmp to have different text values. I know the error is in line 26 when it sets BOLD_OBJ["text"]
to the output of the recursive function, however I'm not sure as to why considering it should be manipulating the copy of the array.
def recursiveScanText(BOLD_OBJ_LIST:list, Y_VALUE: int, output: list):
if BOLD_OBJ_LIST[0]["y0"] == Y_VALUE:
output.append(BOLD_OBJ_LIST[0]["text"])
BOLD_OBJ_LIST.pop(0)
if BOLD_OBJ_LIST == []:
return output
output = recursiveScanText(BOLD_OBJ_LIST, Y_VALUE, output)
return output
else:
return output
def mergeSimilarText(BOLD_OBJ_LIST: list):
"""Merges the objects of a list of objects if they are at a similar (±5) Y coordinate"""
OUTPUT = []
RECURSIVE_SCAN_OUTPUT = []
BOLD_OBJ_LIST = BOLD_OBJ_LIST.copy()
for BOLD_OBJ_INDEX in range(len(BOLD_OBJ_LIST)):
if len(BOLD_OBJ_LIST) > 0 and BOLD_OBJ_INDEX < len(BOLD_OBJ_LIST):
BOLD_OBJ = BOLD_OBJ_LIST[0]
BOLD_CHAR_STRING = recursiveScanText(BOLD_OBJ_LIST, BOLD_OBJ_LIST[BOLD_OBJ_INDEX]["y0"], RECURSIVE_SCAN_OUTPUT)
RECURSIVE_SCAN_OUTPUT = []
BOLD_OBJ["text"] = "".join(BOLD_CHAR_STRING)
OUTPUT.append(BOLD_OBJ)
return OUTPUT
tmp = [
{'y0': 762.064, 'text': '177'},
{'y0': 762.064, 'text': '7'},
{'y0': 114.8281, 'text': 'Q'},
{'y0': 114.8281, 'text': 'u'},
{'y0': 114.8281, 'text': 'e'},
{'y0': 114.8281, 'text': 's'},
{'y0': 114.8281, 'text': 't'},
{'y0': 114.8281, 'text': 'i'},
{'y0': 114.8281, 'text': 'o'},
{'y0': 114.8281, 'text': 'n'},
{'y0': 114.8281, 'text': ' '},
{'y0': 114.8281, 'text': '1'},
{'y0': 114.8281, 'text': '7'},
{'y0': 114.8281, 'text': ' '},
{'y0': 114.8281, 'text': 'c'},
{'y0': 114.8281, 'text': 'o'},
{'y0': 114.8281, 'text': 'n'},
{'y0': 114.8281, 'text': 't'},
{'y0': 114.8281, 'text': 'i'},
{'y0': 114.8281, 'text': 'n'},
{'y0': 114.8281, 'text': 'u'},
{'y0': 114.8281, 'text': 'e'},
{'y0': 114.8281, 'text': 's'},
{'y0': 114.8281, 'text': ' '},
{'y0': 114.8281, 'text': 'o'},
{'y0': 114.8281, 'text': 'n'},
{'y0': 114.8281, 'text': ' '},
{'y0': 114.8281, 'text': 'p'},
{'y0': 114.8281, 'text': 'a'},
{'y0': 114.8281, 'text': 'g'},
{'y0': 114.8281, 'text': 'e'},
{'y0': 114.8281, 'text': ' '},
{'y0': 114.8281,'text': '9'}]
print(mergeSimilarText(tmp))
print(tmp)
Some notes: I have tried changing BOLD_OBJ_LIST = BOLD_OBJ_LIST.copy()
to tmp = BOLD_OBJ_LIST.copy()
and that still doesn't fix it. Also, I don't need to deepcopy as it is an array of dicts not an array of arrays
CodePudding user response:
copy
makes a shallow copy of your list, i.e. if you have mutable elements in that list, you only copy the reference.
You need to use deepcopy
:
from copy import deepcopy
BOLD_OBJ_LIST = deepcopy(BOLD_OBJ_LIST)
This will recursively create a copy of all your elements.