I have 6,369 files of 256 MB each (1.63 TB total) stored in a RAM disk volume on a Linux server equipped with 4 TB of RAM. I need to merge them into a single file stored in the same RAM disk. What kind of merge operation would give me the best performance? If more RAM is needed, I can store the original parts on a 1.9 TB NVMe drive. The server has 128 cores.
Notes:
- Files are already compressed
- We do not have any limitations regarding available RAM or NVMe
CodePudding user response:
Given that these files are ordered in a way (such as continuous numbering or formatted date), cat
should do the trick from the shell prompt:
cat single*.dat > combined.dat
You may want to make sure, that sorting is no issue in your particular shell: https://unix.stackexchange.com/questions/368318/does-the-bash-star-wildcard-always-produce-an-ascending-sorted-list
Other than that, the number of input files, when using the command line (instead of scripting) should be not relevant, but you still should check with your setup beforehand: https://unix.stackexchange.com/questions/356386/is-there-a-maximum-to-bash-file-name-expansion-globbing-and-if-so-what-is-it
CodePudding user response:
It sounds like you don't have an issue of limited memory, so you should just do what Lecraminos suggested.
If there is an issue of limited space, you can have your (hopefully temporary) destination compressed by using
cat single*.dat | gzip > combined.dat.gz
or maybe go over each file and remove each file from the (hopefully temporary) storage after you use it:
for file in single*.dat; do
cat "$file"
rm -f "$file"
done > combined.dat
or both...