I have 600 CSV files of size ~1Mo for a total of roughly 600Mo. I want to put all of them into a sqlite3 db. So my first step would be to merge them into one big csv (of ~600Mo right?) before importing it into a sql db.
However, when I run the following bash command (to merge all files keeping one header):
cat file-chunk0001.csv | head -n1 > file.csv
for f in *.csv; do cat "`pwd`/$f" | tail -n 2 >> file.csv; done
The resulting file.csv has a size of 38Go, at which point the process stops because I have no space left on device.
So my question is: why would the merged file size be more than 50x times bigger than expected? And what can I do to put them in a sqlite3 db with a reasonable size?
CodePudding user response:
I guess my first question is: if you know how to do a for loop, why do you need to merge all the files into a single CSV file? Can't you just load them one after the other?
But your problem is an infinite loop. Your wildcard (*.csv
) includes the file you're writing to. You could put your output file in a different directory or make sure your file glob does not include the output file (for f in file-*.csv
maybe).