I have a really huge text file (more than 4GB). And I have multiple entries to be searched and replaced in this huge file (pattern.txt).
So, I made up a file called leo.sed and used sed -f command.
leo.sed: This file contains around 500 entries. Example:
s/"PET10"/"PETfdfd0"/g
s/"PET11"/"PET123wef"/g
s/"PET12"/"TETPrandom"/g
I am using following sed command but it is extremely slow.
sed -f leo.sed pattern.text | sed -f leo1.sed > pattern_after_leo_leo1_sed.txt
Any faster way to do with perl one-liner?
Thanks!!
CodePudding user response:
If it only needs to be done once, if it's "fast enough", set it running and do something else. Your time is more valuable than the computer's.
If you're limited by how fast your disk is, there's not much you can do.
If not, the same technique, doing 500 patterns on each line, is unlikely to be any faster in Perl. Instead, you need to improve the algorithm. The number of regexes needs to be reduced. This can be done by finding some common pattern.
For example, if it's everything in quotes we can use one regex that matches anything in quotes. Then the replacement value comes from a hash. We set up the hash in a BEGIN
block so that is only done once before the file is scanned. We can use the babycart operator to interpolate an expression in a string.
perl -i.orig -pe 'BEGIN { %replacements = (PET10 => "PETfdfd0", PET11 => "PET123wef"); } s{"([^"] )"}{"@{[$replacements{$1} || $1]}"}g' test.txt
Now each line needs only to be scanned once. This may or may not be faster.
CodePudding user response:
This is a faster version of the code Schwern posted:
perl -i.orig -pe'
BEGIN {
%replacements = map qq{"$_"}, (
PET10 => "PETfdfd0",
PET11 => "PET123wef",
);
}
s{"[^"] "}{ $replacements{$&} // $& }eg
' test.txt