I have the results of a numerical simulation that consist of hundreds of directories; each directory contains millions of text files.
I need to substitute a the string "wavelength;
" with "wavelength_bc;
" so I have tried both the following:
find . -type f -exec sed -i 's/wavelength;/wavelength_bc;/g' {} \;
and
find . -type f -exec sed -i 's/wavelength;/wavelength_bc;/g' {}
Unfortunately, the commands above take a very long time to finish, (more than 1 hour).
I wonder how can I take advantage of the number of cores on my machine (8) to accelerate the command above?
I am thinking of using xargs
with -P
flag. I'm scared that that will corrupt the files; so I have no idea if that is safe or not?
In summary:
- How can I accelerate
sed
substitutions when using withfind
? - Is it safe to uses
xargs -P
to run that in parallel?
Thank you
CodePudding user response:
xargs -P
should be safe to use, however you will need to use -print0
option of find
and piping to xargs -0
to address filenames with spaces or wildcards:
find . -type f -print0 |
xargs -0 -I {} -P 0 sed -i 's/wavelength;/wavelength_bc;/g' {}
-P 0
option in xargs
will run in Parallel mode. It will run as many processes as possible for your CPU.
CodePudding user response:
This might work for you (GNU sed & parallel):
find . -type f | parallel sed -i 's/wavelength;/wavelength_bc;/g' {}
GNU parallel will run as many jobs as there are cores on the machine in parallel.
More sophisticated uses can involve remote servers and file transfer see here and a cheatsheet here.