file1.csv
col1,col2,col3
a,b,c
b,v,n
x,u,v
t,m,m
file2.csv
col1,col2,col3
p,m,n
a,z,i
col1
acts as a primary key in both files. If any value of col1
of file2.csv
occurs in file1.csv
, then this row will be discarded from file1.csv
.
output file:
col1,col2,col3
b,v,n
x,u,v
t,m,m
Note: I'm interested in Unix
based solution. please provide a solution sort
or uniq
cmd or anything else.
CodePudding user response:
There's a join
command on Unix that does pretty much exactly what you want:
join -v1 -t, \
<(tail 2 file1.txt | sort -k1 -t,) \
<(tail 2 file2.txt | sort -k1 -t,)
For the sample files you've given, this is its output:
b,v,n
t,m,m
x,u,v
Command Breakdown
join -v1 -t,
-v1
: display rows from the 1st file that are not pairable with lines in the 2nd file via the join column (default column used for joining is 1, but it's overridable via the-1
and-2
options)-t,
: use comma as a field/column separator
<(tail 2 file1.txt | sort -k1 -t,)
<( … )
: thejoin
command expects file names as arguments, so we use process substitution to create such temporary files from the output of the nested commandstail 2 file1.txt
: skip the header linesort -k1 -t,
: thejoin
command expects sorted files-t,
: use comma as a field/column separator-k1
: sort by the 1st field