I'm trying to get spreadsheet data from zipped .xlsx files. I'm using rubyzip
to access the contents of the zipfile
Zip::File.open(file_path) do |zip_file|
zip_file.each do |entry|
*process entry*
end
end
My problem is that rubyzip gives a Zip::Entry
object, which, I cant get to work with gems like roo
or creek
.
I've done something similar, but with .csv file. This was as simple as CSV.parse(entry.get_input_stream.read)
. However, that just gives me a string of encoded gibberish when using it on an .xlsx file.
I've looked around and the closest answer I got was temporarily extracting the files, but I want to avoid doing this since the files can get pretty large.
Does anyone have any suggestions? Thanks in advance.
CodePudding user response:
So what you need to do is convert the stream into an IO
object that Roo
can understand.
To determine if the object passed to Roo::Spreadsheet.open
is a "stream" Roo
uses the following method:
def is_stream?(filename_or_stream)
filename_or_stream.respond_to?(:seek)
end
Since a Zip::InputStream
does not respond to seek
you cannot use this object directly. To get around this we simply need an object that does respond to seek
(like a StringIO
)
We can just read
the input stream into the StringIO
directly:
stream = StringIO.new(entry.get_input_stream.read)
Or the Zip
library also provides a method to copy a Zip::InputStream
to another IO
object through the IOExtras
module, which I think reads fairly nicely as well.
Knowing all of the above we can implement as follows:
Zip::File.open(file_path) do |zip_file|
zip_file.each do |entry|
# make sure Roo can handle the file (at least based on the extension)
ext = File.extname(entry.name)&.to_sym
next unless Roo::CLASS_FOR_EXTENSION[ext]
# stream = StringIO.new(entry.get_input_stream.read)
::Zip::IOExtras.copy_stream(stream = StringIO.new, entry.get_input_stream)
spreadsheet = Roo::Spreadsheet.open(stream, extension: ext)
# process file
end
end