Home > Software engineering >  How to iterate over zip archive of zip archives in python?
How to iterate over zip archive of zip archives in python?

Time:04-27

There are zip archive zip.zip with zip1.zip and zip2.zip in it. In zip1.zip is 0.txt, in zip2.zip is 0.txt. In all O.txt is string "Hello world". How to iterate over zip.zip to print: "Hello World" "Hello World"

I have that, but it don't work:

from zipfile import ZipFile

with ZipFile(zip.zip, 'r') as zipObj:
    listOfFileNames = zipObj.namelist()
    for fileName in listOfFileNames:
        # Check filename endswith csv
        if fileName.endswith('.zip'):
            with ZipFile(fileName) as z:
                txt = z.read('0.txt')
                print (txt)

debug:

with ZipFile(zip.zip, 'r') as zipObj:
File "E:\python\Lib\zipfile.py", line 1265, in __init__
self._RealGetContents()
File "E:\python\Lib\zipfile.py", line 1328, in _RealGetContents
endrec = _EndRecData(fp)
File "E:\python\Lib\zipfile.py", line 264, in _EndRecData
fpin.seek(0, 2)
AttributeError: 'ZipFile' object has no attribute 'seek'

(updated):

previews debug problem was that in production project i pass not archive zip.zip but list_of_files, without pointing of picking first file (list_of_files[0]):

with ZipFile(list_of_files[0], 'r') as zipObj:

CodePudding user response:

extract the content to memory using io.BytesIO()

from zipfile import ZipFile
import io

with ZipFile('zip.zip', 'r') as zipObj:
    for fileName in zipObj.namelist():
        if fileName.endswith('.zip'):
            content = io.BytesIO(zipObj.read(fileName)) # zip1.zip
            with ZipFile(content) as z:
                for txt in z.namelist():
                    if txt.endswith('.txt'):
                        txt = z.read('0.txt')
                        print (txt)
  • Related