Home > Back-end >  is it possible to get the content of file1 minus file2 by using bash cmd?
is it possible to get the content of file1 minus file2 by using bash cmd?

Time:06-22

I have two files:

log.txt 
log.bak2022.06.20.10.00.txt

the log.bak2022.06.20.10.00.txt is the backup of log.txt at 2022.06.20 10:00.

but the log.txt is a content-increasing file.

now I have a requirement, I want get the content of log.txt minus log.bak2022.06.20.10.00.txt, then write into a new file. is it possible to implement it?

CodePudding user response:

Assumptions:

  • the small file contains N lines, and these N lines are an exact match for the 1st N lines in the big file

Sample inputs:

$ cat small
4
2
1
3

$ cat big
4
2
1
3
8
10
9
4

One comm idea:

$ comm --nocheck-order -13 small big
8
10
9
4

One awk idea:

$ awk '
FNR==NR { max=FNR; next }
FNR>max
' small big
8
10
9
4

One wc/sed idea:

$ max=$(wc -l < small)
$ ((max  ))
$ sed -n "$max,$ p" big
8
10
9
4

CodePudding user response:

  • You could try grep:
grep -vxFf log.bak2022.06.20.10.00.txt log.txt

That will output all the lines of log.txt that don't match any line of log.bak2022.06.20.10.00.txt

  • Or this one with tail and stat that should fast for big files:
tail -c  "$(( $(stat -c %s log.bak2022.06.20.10.00.txt)   1 ))" log.txt

note:stat -c %s is a GNU extension; on BSD you can use stat -f %z

remark: I'm not sure that tail -c uses fseek but if it does then that would be great for this use-case.

CodePudding user response:

awk-based solution without need for unix piping | chains, regex, function calling, or array splitting :

{m,n,g}awk '(_ = NR==FNR ) < FNR' FS='^$' small.txt big.txt 

8
10
9
4
  • Related