I need to be able to read the first (header) row in big xlsx file (350k x 12 cells, ~30MB) very fast in Ruby on Rails app. I am using Roo gem at the moment, which is fine for smaller files. But for files this big it takes 3-4 minutes. Is there a way to do this in seconds?
xlsx = Roo::Spreadsheet.open(file_path)
sheet = xlsx.sheet(0)
header = sheet.row(1)
CodePudding user response:
The ruby gem roo
does not support file streaming; it reads the whole file into memory. Which, as you say, works fine for smaller files but not so well for reading small sections of huge files.
You need to use a different library/approach. For example, you can use the gem: creek
, which describes itself as:
a Ruby gem that provides a fast, simple and efficient method of parsing large Excel (xlsx and xlsm) files.
And, taking the example from the project's README, it's pretty straightforward to translate the code you wrote for roo
into code that uses creek
:
require 'creek'
creek = Creek::Book.new(file_path)
sheet = creek.sheets[0]
header = sheet.rows[0]
Note: A quick google of your StackOverflow question title led me to this blog post as the top search result. It's always worth searching on Google first.
CodePudding user response:
Using #gets
could work, maybe something like:
first_line_data = File.open(file_path, "rb", &:gets)
first_line_file = File.open("tmp_file.xlsx", "wb") { |f| f << first_line_data }
xlsx = Roo::Spreadsheet.open("tmp_file.xlsx")
# etc...