When I run this script I can verify that it loops through all of the values, but not all of them get passed into my dictionary
file = open('path', 'rb')
readFile = PyPDF2.PdfFileReader(file)
lineData = {}
totalPages = readFile.numPages
for i in range(totalPages):
pageObj = readFile.getPage(i)
pageText = pageObj.extractText
newTrans = re.compile(r'Jan \d{2,}')
for line in pageText(pageObj).split('\n'):
if newTrans.match(line):
newValue = re.split(r'Jan \d{2,}', line)
newValueStr = ' '.join(newValue)
newKey = newTrans.findall(line)
newKeyStr = ' '.join(newKey)
print(newKeyStr newValueStr)
lineData[newKeyStr] = newValueStr
print(len(lineData))
There are 80 data pairs but when I run this the dict only gets 37
CodePudding user response:
Well, duplicate keys, maybe? Try to make lineData = [] and append there: lineData.append({newKeyStr:newValueStr} and then check how many records you get.