I am looking to find matches anywhere between 2 csv files, email addresses specifically. If I run this command via bash (I would like to figure out something via bash for now):
grep -w --file <(sort -u file1.csv) <(sort -u file2.csv| cut d',' -f2)
it works. But there is surrounding character data in the same row that I would like to print out either to stdin or to a new file eventually. How do I do this diff comparison and still preserve the row data for printing?
file1.csv consists of:
[email protected]
file2.csv consists of:
,[email protected],123456,"""ABC, Blah Blah."""
I would like to do my comparison but join the ABC, Blah Blah data back to the final result.
Thank you!
CodePudding user response:
rquery (https://github.com/fuyuncat/rquery/releases) can handle the content of multiple files like SQL query.
[ rquery]$ cat samples/email1.csv
[email protected]
[email protected]
[ rquery]$ cat samples/email2.csv
,[email protected],123456,"""ABC, Blah Blah."""
,[email protected],123456,"""123, Blah Blah."""
,[email protected],123456,"""CBA, Blah Blah."""
,[email protected],123456,"""BBB, Blah Blah."""
,[email protected],123456,"""CCC, Blah Blah."""
[ rquery]$ ./rq -q "p d/,/\"\"/|m @1 where @fileid=1|s @raw | f @fileid=2 and @2 = @r[1][1]" samples/email1.csv samples/email2.csv
,[email protected],123456,"""ABC, Blah Blah."""
,[email protected],123456,"""CBA, Blah Blah."""
CodePudding user response:
When trying to join table data on indexes, SQLite3 is a nice tool to have as it is capable of directly importing CSV datafiles.
Here is an example:
file1.csv
:
[email protected]
[email protected]
file2.csv
:
[email protected],123456,"""ABC, Blah Blah."""
[email protected],7684652,"Foo, bar, baz."
[email protected],424255,"Hello kitten"
[email protected],6578218,"This to be output"
#! /bin/sh
sqlite3 :memory: << 'EOF'
CREATE TABLE emails (
email text NOT NULL PRIMARY KEY
);
CREATE TABLE contacts (
email text NOT NULL PRIMARY KEY,
phone text,
description text
);
.mode csv
.import file1.csv emails
.import file2.csv contacts
SELECT c.*
FROM emails AS e
LEFT JOIN contacts AS c
ON e.email = c.email;
EOF
Output from this program above:
[email protected],6578218,"This to be output"
[email protected],7684652,"Foo, bar, baz."
If your CSV files have column headers, then you can let SQLite3 handle tables creation for you:
#! /bin/sh
sqlite3 :memory: << 'EOF'
.mode csv
.headers on
.import file1.csv emails
.import file2.csv contacts
SELECT c.*
FROM emails AS e
LEFT JOIN contacts AS c
ON e.email = c.email;
EOF