I have two files:
log.txt
log.bak2022.06.20.10.00.txt
the log.bak2022.06.20.10.00.txt
is the backup of log.txt
at 2022.06.20 10:00
.
but the log.txt
is a content-increasing file.
now I have a requirement, I want get the content of log.txt
minus log.bak2022.06.20.10.00.txt
, then write into a new file.
is it possible to implement it?
CodePudding user response:
Assumptions:
- the small file contains N lines, and these N lines are an exact match for the 1st N lines in the big file
Sample inputs:
$ cat small
4
2
1
3
$ cat big
4
2
1
3
8
10
9
4
One comm
idea:
$ comm --nocheck-order -13 small big
8
10
9
4
One awk
idea:
$ awk '
FNR==NR { max=FNR; next }
FNR>max
' small big
8
10
9
4
One wc/sed
idea:
$ max=$(wc -l < small)
$ ((max ))
$ sed -n "$max,$ p" big
8
10
9
4
CodePudding user response:
- You could try
grep
:
grep -vxFf log.bak2022.06.20.10.00.txt log.txt
That will output all the lines of log.txt
that don't match any line of log.bak2022.06.20.10.00.txt
- Or this one with
tail
andstat
that should fast for big files:
tail -c "$(( $(stat -c %s log.bak2022.06.20.10.00.txt) 1 ))" log.txt
note:stat -c %s
is a GNU extension; on BSD you can use stat -f %z
remark: I'm not sure that tail -c
uses fseek
but if it does then that would be great for this use-case.
CodePudding user response:
awk
-based solution without need for unix piping |
chains, regex
, function calling, or array
splitting :
{m,n,g}awk '(_ = NR==FNR ) < FNR' FS='^$' small.txt big.txt
8
10
9
4