Home > Back-end >  Extracting files from zip file from GET request
Extracting files from zip file from GET request

Time:11-03

I currently have a GET request to a URL that returns three things: .zip file, .zipsig file, and a .txt file.

I'm only interested in the .zip file which has dozens of .json files. I would like to extract all these .json files, preferable directly into a single pandas data frame, but extracting them into a folder also works.

Code so far, mostly stolen:


license = requests.get(url, headers={'Authorization': "Api-Token "   'blah'})
z = zipfile.ZipFile(io.BytesIO(license.content))
billingRecord = z.namelist()[0]
z.extract(billingRecord, path = "C:\\Users\\Me\\Downloads\\Json license")

This extracts the entire .zip file to the path. I would like to extract the individual .json files from said .zip file to the path.

CodePudding user response:

I would do something like this. Obviously this is my test.zip file but the steps are:

  1. List the files from the archive using the .infolist() method on your z archive
  2. Check if the filename ends with the json extension using .endswith('.json')
  3. Extract that filename with .extract(info.filename, info.filename)

Obviously you've called your archive z but mine is archive bu that should get you started.

Example code:

import zipfile

with zipfile.ZipFile("test.zip", mode="r") as archive:
    for info in archive.infolist():
        print(info.filename)
        if info.filename.endswith('.png'):
            print('Match: ', info.filename)
            archive.extract(info.filename, info.filename)

CodePudding user response:

import io
import zipfile
import pandas as pd
import json
dfs = []
with zipfile.ZipFile(io.BytesIO(license.content)) as zfile:
    for info in zfile.infolist():
        if info.filename.endswith('.zip'):
            zfiledata = io.BytesIO(zfile.read(info.filename))
            with zipfile.ZipFile(zfiledata) as json_zips:
                for info in json_zips.infolist():
                    if info.filename.endswith('.json'):
                        json_data = pd.json_normalize(json.loads(json_zips.read(info.filename)))
                        dfs.append(json_data)
df = pd.concat(dfs, sort=False)
print(df)
  • Related