Home > Mobile >  Python: For loop only iterates once - also using a with statement
Python: For loop only iterates once - also using a with statement

Time:04-01

I am trying to open a zip file and iterate through the PDFs in the zip file. I want to scrape a certain portion of the text in the pdf. I am using the following code:

def get_text(part):
    #Create path
    path = f'C:\\Users\\user\\Data\\Part_{part}.zip'
    
    with zipfile.ZipFile(path) as data:
        listdata = data.namelist()
        onlypdfs = [k for k in listdata if '_2018' in k or '_2019' in k or '_2020' in k or '_2021' in k or '_2022' in k]

        for file in onlypdfs:
            with data.open(file, "r") as f:
                #Get the pdf
                pdffile = pdftotext.PDF(f)
                text = ("\n\n".join(pdffile))

    
                #Remove the newline characters
                text = text.replace('\r\n', ' ')
                text = text.replace('\r', ' ')
                text = text.replace('\n', ' ')
                text = text.replace('\x0c', ' ')

                #Get the text that will talk about what I want
                try:
                    text2 = re.findall(r'FEES (. ?) Types', text, re.IGNORECASE)[-1]

                except:
                    text2 = 'PROBLEM'

                #Return the file name and the text
                return file, text2

Then in the next line I am running:

info = []
for i in range(1,2):
    info.append(get_text(i))
info

My output is only the first file and text. I have 4 PDFs in the zip folder. Ideally, I want it to iterate through the 30 zip files. But I am having trouble with just one. I've seen this question asked before, but the solutions didn't fit my problem. Is it something with the with statement?

CodePudding user response:

You need to process all the files and store each of them as you iterate. An example of how you could do this is to store them in a list of tuples:

file_list = []
for file in onlypdfs:
    ...
    file_list.append((file, text2)
return file_list

You could then use this like so:

info = []
for i in range(1,2):
    list = get_text(i)
    for file_text in list:
        info.append(file_text)
print(info)

CodePudding user response:

When you use the return statement on this line: return file, text2, you exit the for loop, skipping the other pdf's that you want to be reading.

The solution is to move the return statement outside of the for loop.

  • Related