I have a zip archive path_to_zip_file
in a read-only system. The tricky thing is that I need to unzip its content and open a CSV file testfile.csv
that is included in the zip archive. Please notice that the zip archive includes many different files, but I only want to take a CSV file from it.
My goal is to get the content of this CSV file into pandas dataframe df
.
My code is shown below. Is there any way to update it in such a way that it can be executed in a read-only system? In other words, how can I run it in memory without writing to disk?
import zipfile
import pandas as pd
path_to_zip_file = "data/test.zip"
directory_to_extract_to = "result"
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
zip_ref.extractall(directory_to_extract_to)
csv_file_name = "testfile.csv"
df = pd.read_csv("{}/{}".format(directory_to_extract_to,csv_file_name), index_col=False)
CodePudding user response:
Easy way to do it is to extract it to /tmp, which is a directory in RAM. You could also use python's tempfile library to create a temporary directory and extract it there (it will probably just create a directory in /tmp)
CodePudding user response:
Using ZipFile.open
on the already opened archive, we can do just that:
import zipfile
import pandas as pd
with zipfile.ZipFile("archive.zip") as archive:
with archive.open("testing.txt") as csv:
df = pd.read_csv(csv)
print(df)