I have a huge text file (~1.5GB) with numerous lines ending with ".Ends".
I need a linux oneliner (perl\ awk\ sed) to find the last place '.Ends' appear in the file and add a couple of lines before it.
I tried using tac
twice, and stumbled with my perl:
When I use:
tac ../../test | perl -pi -e 'BEGIN {$flag = 1} if ($flag==1 && /.Ends/) {$flag = 0 ; print "someline\n"}' | tac
It first prints the "someline\n" and only than prints the .Ends
The result is:
…
.Ends
someline
When I use:
tac ../../test | perl -e 'BEGIN {$flag = 1} print ; if ($flag==1 && /.Ends/) {$flag = 0 ; print "someline\n"}' | tac
It doesn’t print anything.
And when I use:
tac ../../test | perl -p -e 'BEGIN {$flag = 1} print $_ ; if ($flag==1 && /.Ends/) {$flag = 0 ; print "someline\n"}' | tac
It prints everything twice:
…
.Ends
someline
.Ends
Is there a smooth way to perform this edit?
Don't have to be with my solution direction, I'm not picky...
Bonus - if the lines can come from a different file, it would be great (but really not a must)
Edit
test input file:
gla2
fla3
dla4
rfa5
.Ends
shu
sha
she
.Ends
res
pes
ges
.Ends
--->
...
pes
ges
someline
.Ends
CodePudding user response:
With Perl, you can use the Tie::File
core module to easily read a file from the end, add in lines where you need, and that's it.
It does not read the whole file into memory, and there is no need to cut and paste in your file. Should be very fast.
use strict;
use warnings;
use Tie::File;
# Usage: foo.pl target.txt newlines.txt
#
my $file = shift; # file to edit
chomp(my @new = <>); # read new lines via <>
tie my @file, 'Tie::File', $file or die "Cannot tie '$file': $!";
my $index = $#file; # start from last line in file
while ($index >= 0) { # loop over line number
if ($file[$index] =~ /\.Ends/) {
splice @file, $index, 0, @new; # add new lines before
last; # then exit loop
}
$index--; # next line, going backwards
}
CodePudding user response:
Inputs:
$ cat test.dat
dla4
.Ends
she
.Ends
res
.Ends
abc
$ cat new.dat
newline 111
newline 222
One awk
idea that sticks with OP's tac | <process> | tac
approach:
$ tac test.dat | awk -v new_dat="new.dat" '1;/.Ends/ && !(seen ) {system("tac " new_dat)}' | tac
dla4
.Ends
she
.Ends
res
newline 111
newline 222
.Ends
abc
Another awk
idea that replaces the dual tac
calls with a dual-pass of the input file:
$ awk -v new_dat="new.dat" 'FNR==NR { if ($0 ~ /.Ends/) lastline=FNR; next} FNR==lastline { system("cat "new_dat) }; 1' test.dat test.dat
dla4
.Ends
she
.Ends
res
newline 111
newline 222
.Ends
abc
NOTES:
- both of these solutions write the modified data to stdout (same thing OP's current code does)
- neither of these solutions modify the original input file (
test.dat
)
CodePudding user response:
Using GNU sed
, -i.bak
will create a backup file with a .bak
extension while saving the original file in-place
$ sed -Ezi.bak 's/(.*)(\.Ends)/\1newline\nnewline\n\2/' input_file
$ cat input_file
gla2
fla3
dla4
rfa5
.Ends
shu
sha
she
.Ends
res
pes
ges
.Ends
--->
...
pes
ges
someline
newline
newline
.Ends
CodePudding user response:
Assuming that the last instance of that phrase is far down the file it helps performance greatly to process the file from the back, for example using File::ReadBackwards.
Since you need to add other text to the file before the last marker then we have to copy the rest of it so to able to put it back after the addition.
use warnings;
use strict;
use feature 'say';
use Path::Tiny;
use File::ReadBackwards;
my $file = shift // die "Usage: $0 file\n";
my $bw = File::ReadBackwards->new($file);
my @rest_after_marker;
while ( my $line = $bw->readline ) {
unshift @rest_after_marker, $line;
last if $line =~ /\.Ends/;
}
# Position after which to add text and copy back the rest
my $pos = $bw->tell;
$bw->close;
open my $fh, ' <', $file or die $!;
seek $fh, $pos, 0;
truncate $fh, $pos;
print $fh $_ for path("add.txt")->slurp, @rest_after_marker;
New text to add before the last .Ends
is presumably in a file add.txt
. This code can be shortened and written as a command-line program ("one-liner") but I assume that that isn't essential.
The question remains of how much file there is after the last .Ends
marker? Note that we are copying all that in memory, to be able to write it back. If that is too much, copy it to a temporary file instead of memory, then use it from there and remove the file.
CodePudding user response:
Inputs:
$ cat test.dat
dla4
.Ends
she
.Ends
res
.Ends
abc
$ cat new.dat
newline 111
newline 222
One ed
approach:
$ ed test.dat >/dev/null 2>&1 <<EOF
1
?.Ends
-1r new.dat
wq
EOF
Where:
>/dev/null 2>&1
- brute force suppression of diagnostic and info messages1
- go to line #1?.Ends
- search backwards in file for string.Ends
(ie, find last.Ends
in file)-1r new.dat
- move back/up 1 line (-1
) in file andr
ead in the contents ofnew.dat
wq
-w
rite andq
uit (aka save and exit)
This generates:
$ cat test.dat
dla4
.Ends
she
.Ends
res
newline 111
newline 222
.Ends
abc
NOTE: unlike OP's current code which writes the modified data to stdout, this solution modifies the original input file (test.dat
)
CodePudding user response:
$ tac file | awk '{print} $0==".Ends"{print "bar\nfoo"}' | tac
gla2
fla3
dla4
rfa5
foo
bar
.Ends
shu
sha
she
foo
bar
.Ends
res
pes
ges
.Ends
--->
...
pes
ges
someline
foo
bar
.Ends
Note you have to print the lines in reverse order as the final tac will reverse them again. There's easy ways to not have to worry about that if you care, e.g.:
$ tac file |
awk -v str='foo\nbar' '
BEGIN{ n=split(str,lines,RS); str=""; for (i=n; i>=1; i--) str=str lines[i] ORS }
{ print $0 }
$0 == ".Ends" { printf "%s", str }
' |
tac
gla2
fla3
dla4
rfa5
foo
bar
.Ends
shu
sha
she
foo
bar
.Ends
res
pes
ges
.Ends
--->
...
pes
ges
someline
foo
bar
.Ends
or if you want to read the new lines from a file:
$ cat new
foo
bar
$ tac file | awk 'NR==FNR{str=$0 ORS str; next} {print} $0==".Ends"{printf "%s", str}' new - | tac
gla2
fla3
dla4
rfa5
foo
bar
.Ends
shu
sha
she
foo
bar
.Ends
res
pes
ges
.Ends
--->
...
pes
ges
someline
foo
bar
.Ends
CodePudding user response:
First let grep
do the searching, then inject the lines with awk
.
$ cat insert
new content
new content
$ line=$(cat insert)
$ awk -v var="${line}" '
NR==1{last=$1; next}
FNR==last{print var}1' <(grep -n "^\.Ends$" file | cut -f 1 -d : | tail -1) file
rfa5
.Ends
she
.Ends
ges
.Ends
ges
new content
new content
.Ends
ges
ges
Data
$ cat file
rfa5
.Ends
she
.Ends
ges
.Ends
ges
.Ends
ges
ges