Home > Back-end >  Ruby and parsing huge JSON-Lines from download
Ruby and parsing huge JSON-Lines from download

Time:02-19

Ruby 3.1.0

I am trying to parse JSON Lines without blowing up memory. My routine prints nothing. I am wondering where I am going wrong. I open a tempfile to hold the huge file, which I am thinking is mistake #1. But I don't know how else to structure this. I then try and copy the huge file from Google to my tempfile, and then step through that one line at a time. I get nothing... I am perplexed.

Oh. I figured it out. copy_stream leaves the file at EOF. I just had to rewind it to use it.

require "tempfile"
require "open-uri"
require "json"

url = "https://storage.googleapis.com/somehugefile.jsonl"

inventory_file = Tempfile.new
inventory_file.binmode
uri = URI(url)
IO.copy_stream(uri.open, inventory_file)      

f = File.foreach(inventory_file)
f.each_entry {|line| puts JSON.parse(line) }

CodePudding user response:

It was simple. I did not know that copy_stream method left the file pointer at the end of the file. So I just had to do a rewind on it, and it all worked as expected.

  • Related