Home > Software engineering >  How to unzip without writing to disk?
How to unzip without writing to disk?

Time:10-16

I have a zip archive path_to_zip_file in a read-only system. The tricky thing is that I need to unzip its content and open a CSV file testfile.csv that is included in the zip archive. Please notice that the zip archive includes many different files, but I only want to take a CSV file from it. My goal is to get the content of this CSV file into pandas dataframe df.

My code is shown below. Is there any way to update it in such a way that it can be executed in a read-only system? In other words, how can I run it in memory without writing to disk?

import zipfile
import pandas as pd

path_to_zip_file = "data/test.zip"
directory_to_extract_to = "result"
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

csv_file_name = "testfile.csv"
df = pd.read_csv("{}/{}".format(directory_to_extract_to,csv_file_name), index_col=False)

CodePudding user response:

Easy way to do it is to extract it to /tmp, which is a directory in RAM. You could also use python's tempfile library to create a temporary directory and extract it there (it will probably just create a directory in /tmp)

CodePudding user response:

Using ZipFile.open on the already opened archive, we can do just that:

import zipfile
import pandas as pd

with zipfile.ZipFile("archive.zip") as archive:
    with archive.open("testing.txt") as csv:
        df = pd.read_csv(csv)

print(df)
  • Related