Home > Mobile >  Import data in gzip archive to mongodb
Import data in gzip archive to mongodb

Time:11-04

I have data stored in gzip archive folders, every archive contains a big file that includes json in the following format:

{key:value, key:value}
{key:value, key:value}
{key:value, key:value}

I need to import the data to MongoDB. What is the best way to do that? I can't extract the gzip on my PC as each file (not archived) is about 1950MB.

CodePudding user response:

I've imported tens of billions of lines of CSV and JSON to MongoDB in the past year, even from zipped formats. Having tried them all to save precious time, here's what I would like to recommend:

  • unzip the file
  • pass it as an argument to mongoimport
  • create the index on the fields you want, but ONLY at the end of the entire data insert process.

You can find the mongoimport documentation at: https://www.mongodb.com/docs/database-tools/mongoimport/

If you have a lot of files, you may want to do a for in bash that unzips and passes the filename as an argument to mongoimport. If you are worried about not having enough disk space you can also delete the unzipped file at the end of each single import of mongoimport.

Hope it helped!

CodePudding user response:

You can unzip the files to STDOUT and pipe the stream into mongoimport. Then you don't need to safe the uncompressed file to your local disk:

gunzip --stdout your_file.json.gz | mongoimport --uri=<connection string> --collection=<collection> --db=<database>
  • Related