Ruby 3.1.0
I am trying to parse JSON Lines without blowing up memory. My routine prints nothing. I am wondering where I am going wrong. I open a tempfile to hold the huge file, which I am thinking is mistake #1. But I don't know how else to structure this. I then try and copy the huge file from Google to my tempfile, and then step through that one line at a time. I get nothing... I am perplexed.
Oh. I figured it out. copy_stream leaves the file at EOF. I just had to rewind it to use it.
require "tempfile"
require "open-uri"
require "json"
url = "https://storage.googleapis.com/somehugefile.jsonl"
inventory_file = Tempfile.new
inventory_file.binmode
uri = URI(url)
IO.copy_stream(uri.open, inventory_file)
f = File.foreach(inventory_file)
f.each_entry {|line| puts JSON.parse(line) }
CodePudding user response:
It was simple. I did not know that copy_stream method left the file pointer at the end of the file. So I just had to do a rewind on it, and it all worked as expected.