Home > OS >  Substract the content of a file from another file
Substract the content of a file from another file

Time:09-06

I'm working on a bash script, with the main objective is to create a .conf file, in which the content is the subtraction of file 2 from file 1.
Example :
File 1

ready   serv1   FBgn001bKJ
ready   serv2   FBgn003mLo  
ready   serv3   FBgn002lPx  
ready   serv4   FBgn000Pas  

File 2

ready   serv1   FBgn001bKJ
ready   serv4   FBgn000Pas

Result

ready   serv2   FBgn003mLo  
ready   serv3   FBgn002lPx

I've tried to use this function but it doesn't give any result :

COMPARE_FILES() {
awk '
    NR==FNR {a[FNR]=$0; next}
    {
        b=$0; gsub(/[0-9] /,"",b)
        c=a[FNR]; gsub(/[0-9] /,"",c)
        if (b != c) {printf "< %s\n> %s\n", $0, a[FNR]}
    }' "$1" "$2"
}

Any suggestion of how i can make it work ! PS : The whitespace between the two files can be different!

CodePudding user response:

Assumptions:

  • each line within a file is unique (ie, no duplicate lines exist within a given file)
  • matching lines are 100% identical (this actually isn't the case with OP's data as I found a variable number of trailing spaces in some lines; I manually removed all trailing spaces before running the following solutions)

One comm idea:

$ comm -23 file1 file2
ready   serv2   FBgn003mLo
ready   serv3   FBgn002lPx

NOTE: comm requires input files are already sorted (as per OP's sample)

As for an awk solution:

$ awk 'FNR==NR {a[$0];next} !($0 in a)' file2 file1
ready   serv2   FBgn003mLo
ready   serv3   FBgn002lPx

NOTE: the 1st file fed to awk is file2


Modifying to remove trailing white space:

$ comm -23 <(sed 's/[[:space:]]*$//' file1) <(sed 's/[[:space:]]*$//' file2)
ready   serv2   FBgn003mLo
ready   serv3   FBgn002lPx

$ awk '{sub(/[[:space:]]*$/,"")} FNR==NR {a[$0];next} !($0 in a)' file2 file1
ready   serv2   FBgn003mLo
ready   serv3   FBgn002lPx

CodePudding user response:

Assuming the files you are comparing are sorted, diff may also be an option:

$ diff --unchanged-group-format="" --new-group-format="%>" f1.txt f2.txt 
ready   serv2   FBgn003mLo   
ready   serv3   FBgn002lPx

(I noted the extra white space in the OP's data as mentioned in the answer by @markp-fuso, but that did not affect my diff results)

CodePudding user response:

Why not a with simple grep?

grep -vxFf file2 file1
  • Related