I understand that the general approach is to use something like
$ sort file1.txt | uniq > file2.txt
But I was wondering if there was a way to do this without needing separate source and destination files, even if it means it can't be a one-liner.
CodePudding user response:
With GNU awk for "inplace" editing:
awk -i inplace '!seen[$0] ' file1.txt
As with all tools (except ed
which requires the whole file to be read into memory first) that support "inplace" editing (sed -i
, perl -i
, ruby -i
, etc.) this uses a temp file behind the scenes.
With any awk you can do the following with no temp files used but about twice the memory used instead:
awk '!seen[$0] {a[ n]=$0} END{for (i=1;i<=n;i ) print a[i] > FILENAME}' file
CodePudding user response:
With Perl's -i
:
perl -i -lne 'print unless $seen{$_} ' original.file
-i
changes the file "in place";-n
reads the input line by line, running the code for each line;-l
removes newlines from input and adds them toprint
;- The
%seen
hash idiom is described in perlfaq4.
CodePudding user response:
A common idiom is:
temp=$(mktemp)
some_pipeline < original.file > "$temp" && mv "$temp" original.file
The &&
is important: if the pipeline fails, then the original file won't be overwritten with (perhaps) garbage.
The Linux moreutils
package contains a program that encapsulates this away:
some_pipeline < original.file | sponge original.file
CodePudding user response:
Simply use the -o
and -u
options of sort
:
sort -o file -u file
You don't need even to use a pipe for another command, such as uniq
.
CodePudding user response:
Using sed
$ sed -i -n 'G;/^\(.*\n\).*\n\1$/d;H;P' input_file
G
- Append hold space./^\(.*\n\).*\n\1$/d
- Using back-referencing, match and delete duplicated lines.H
- Copy pattern space to hold space.P
- Print the current pattern space up to the first newline.