I currently have a GET request to a URL that returns three things: .zip file, .zipsig file, and a .txt file.
I'm only interested in the .zip file which has dozens of .json files. I would like to extract all these .json files, preferable directly into a single pandas data frame, but extracting them into a folder also works.
Code so far, mostly stolen:
license = requests.get(url, headers={'Authorization': "Api-Token " 'blah'})
z = zipfile.ZipFile(io.BytesIO(license.content))
billingRecord = z.namelist()[0]
z.extract(billingRecord, path = "C:\\Users\\Me\\Downloads\\Json license")
This extracts the entire .zip file to the path. I would like to extract the individual .json files from said .zip file to the path.
CodePudding user response:
I would do something like this. Obviously this is my test.zip file but the steps are:
- List the files from the archive using the
.infolist()
method on your z archive - Check if the filename ends with the json extension using
.endswith('.json')
- Extract that filename with
.extract(info.filename, info.filename)
Obviously you've called your archive z
but mine is archive
bu that should get you started.
Example code:
import zipfile
with zipfile.ZipFile("test.zip", mode="r") as archive:
for info in archive.infolist():
print(info.filename)
if info.filename.endswith('.png'):
print('Match: ', info.filename)
archive.extract(info.filename, info.filename)
CodePudding user response:
import io
import zipfile
import pandas as pd
import json
dfs = []
with zipfile.ZipFile(io.BytesIO(license.content)) as zfile:
for info in zfile.infolist():
if info.filename.endswith('.zip'):
zfiledata = io.BytesIO(zfile.read(info.filename))
with zipfile.ZipFile(zfiledata) as json_zips:
for info in json_zips.infolist():
if info.filename.endswith('.json'):
json_data = pd.json_normalize(json.loads(json_zips.read(info.filename)))
dfs.append(json_data)
df = pd.concat(dfs, sort=False)
print(df)