Home > Blockchain >  How to accelerate substitution when using GNU sed with GNU find?
How to accelerate substitution when using GNU sed with GNU find?

Time:10-17

I have the results of a numerical simulation that consist of hundreds of directories; each directory contains millions of text files.

I need to substitute a the string "wavelength;" with "wavelength_bc;" so I have tried both the following:

find . -type f -exec sed -i 's/wavelength;/wavelength_bc;/g' {} \;

and

find . -type f -exec sed -i 's/wavelength;/wavelength_bc;/g' {}  

Unfortunately, the commands above take a very long time to finish, (more than 1 hour).

I wonder how can I take advantage of the number of cores on my machine (8) to accelerate the command above?

I am thinking of using xargs with -P flag. I'm scared that that will corrupt the files; so I have no idea if that is safe or not?

In summary:

  • How can I accelerate sed substitutions when using with find?
  • Is it safe to uses xargs -P to run that in parallel?

Thank you

CodePudding user response:

xargs -P should be safe to use, however you will need to use -print0 option of find and piping to xargs -0 to address filenames with spaces or wildcards:

find . -type f -print0 |
xargs -0 -I {} -P 0 sed -i 's/wavelength;/wavelength_bc;/g' {}

-P 0 option in xargs will run in Parallel mode. It will run as many processes as possible for your CPU.

CodePudding user response:

This might work for you (GNU sed & parallel):

find . -type f | parallel sed -i 's/wavelength;/wavelength_bc;/g' {}

GNU parallel will run as many jobs as there are cores on the machine in parallel.

More sophisticated uses can involve remote servers and file transfer see here and a cheatsheet here.

  • Related