Search on the last column with \ delimiter and save the email address associated to it to a variabl-CodePudding

I have two files.

file1.txt contains:

META GAIN CORP
GG$
ABG$
PEPRA_UAT
12GHR
CC$
USDP_MAIN
XQ$
PR$
MIX_DEV

and file2.csv contains:

\\fr.usdp.org\SOLE\Home\RD,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\99 FLOOR,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\44 FLOOR,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,[email protected]
\\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,[email protected]
\\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,[email protected]
\\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DATABASE,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\NICE SHORT,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\PRO DEV,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\FARE GRUST,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ GROUP,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ TEAM TOOLKIT,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\BILLING ELEMENT,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\RRT_SEC,[email protected]

had this on my script but I can't exactly get the last column if there are spaces.

for sr in `cat file1.txt`; do
            sname=`echo ${sr} | awk -F: '{ print $1 }'`
            emdrs=`grep -Fw "${sname}" file2.csv | awk -F',' '{print$2}' | sed 's/[[:space:]]//' | xargs | sed -e 's/ /,/g'`
            echo "$sname || To: $emdrs" >> details.txt
done

details.txt output

META || [email protected],[email protected],[email protected],[email protected]
GAIN || [email protected],[email protected],[email protected],[email protected]
CORP || [email protected],[email protected],[email protected],[email protected]

but what i wanted is that

META GAIN CORP || To: [email protected],[email protected],[email protected],[email protected]

and I should also be able to search string with $ like this one ABG$ ) and not including the duplicate email.

ABG$ || To: [email protected],[email protected]

Any help will be greatly appreciated.

CodePudding user response：

Something like this?

while read -r sr; do
  emails="$(grep -F "\\${sr}," file2.csv | cut -d',' -f2 | sort -u | tr '\n' ',')"
  if [ -n "$emails" ]; then
    echo "$sr || To: ${emails%,}"
  fi
done < file1.txt

Some explanations:

grep -F - treat pattern ($sr) as fixed strings and not regular expressions to avoid $ matching end of line
cut -d',' -f2 - Cut the result at the comma and only output the 2nd part
sort -u - remove duplicates
tr '\n' ',' - remove newlines with commas
${emails%,} - remove the trailing comma
if [ -n "$emails" ] only output if $emails is not empty

CodePudding user response：

One awk idea (replaces OP's current for loop):

awk -F',|\\\' '                                         # field delimiter of "," or "\"
FNR==NR { srlist[$1]
          next
        }
        { email=$NF
          if (email == "") next
          sr=$(NF-1)

          if (sr in srlist && emlist[sr] !~ email) {    # skip duplicate email addresses
                delim=(emlist[sr]) ? "," : ""
                emlist[sr]=emlist[sr] delim email
             }
        }
END     { for (sr in emlist)
              print sr " || To: " emlist[sr]
        }
' file1.txt file2.csv

This generates:

ABG$ || To: [email protected],[email protected]
META GAIN CORP || To: [email protected],[email protected],[email protected],[email protected]

NOTES:

while a bit more typing than OP's current for loop, this approach requires a single scan of file2.awk and eliminates the 7 subprocess calls (for each pass through OP's for loop)
for any appreciable volume of data an awk solution should be noticeably faster
for the sample data provided:
- 0.65 secs: awk
- 1.80 secs: bash/for-loop