is it possible to get the content of file1 minus file2 by using bash cmd?-CodePudding

I have two files:

log.txt 
log.bak2022.06.20.10.00.txt

the log.bak2022.06.20.10.00.txt is the backup of log.txt at 2022.06.20 10:00.

but the log.txt is a content-increasing file.

now I have a requirement, I want get the content of log.txt minus log.bak2022.06.20.10.00.txt, then write into a new file. is it possible to implement it?

CodePudding user response：

Assumptions:

the small file contains N lines, and these N lines are an exact match for the 1st N lines in the big file

Sample inputs:

$ cat small
4
2
1
3

$ cat big
4
2
1
3
8
10
9
4

One comm idea:

$ comm --nocheck-order -13 small big
8
10
9
4

One awk idea:

$ awk '
FNR==NR { max=FNR; next }
FNR>max
' small big
8
10
9
4

One wc/sed idea:

$ max=$(wc -l < small)
$ ((max  ))
$ sed -n "$max,$ p" big
8
10
9
4

CodePudding user response：

You could try grep:

grep -vxFf log.bak2022.06.20.10.00.txt log.txt

That will output all the lines of log.txt that don't match any line of log.bak2022.06.20.10.00.txt

Or this one with tail and stat that should fast for big files:

tail -c  "$(( $(stat -c %s log.bak2022.06.20.10.00.txt)   1 ))" log.txt

^{note:stat -c %s is a GNU extension; on BSD you can use stat -f %z}

remark: I'm not sure that tail -c uses fseek but if it does then that would be great for this use-case.

CodePudding user response：

awk-based solution without need for unix piping | chains, regex, function calling, or array splitting :

{m,n,g}awk '(_ = NR==FNR ) < FNR' FS='^$' small.txt big.txt 

8
10
9
4