Shell - filter string that contains from email address-CodePudding

I have a email log and I would like to print to file only senders emails address:

I have filtered the whole log using:

grep "to=<email@address>" input.log | grep "from=" > output.txt

Output is (edited for privacy):

Jun 26 09:21:21 X1-X5-mx postfix/cleanup[9164]: QueueID:XXX milter-reject: END-OF-MESSAGE from ipXX.ip-XX-XXX-XXX.eu[XXX.XXX.XXX.XX]: 5.7.1 Rejected by SPAM_FILTER (spam); from=<email@address> to=<email@address> proto=....

I would like to print to a separate file only the from=<email@address> part - ideally without the from=<> part. Senders email address is random.

Do you have any idea how to do this please?

CodePudding user response：

You can fold both greps into a single sed or Awk script. See also useless use of grep.

sed -n '/to=<email@address>/s/.*from=<\([^<>]*\).*/\1/p' input.log > output.txt

In brief, sed -n says to not print by default, the address expression /to=<...>/ says to operate only on lines matching that regex, and the substitution command s/...\(...\).../\1/p says to replace the whole match with just the part within the parentheses, that is, extract just the from=<...> string, and print the resulting line.

If the sender address comes from a variable, you need double quotes instead of single.

addr='email@address'
sed -n "/to=<$addr>/s/.*from=<\([^<>]*\).*/\1/p" input.log > output.txt

CodePudding user response：

If you have pcregrep or a grep that has -P, you can do:

grep -F 'to=<email@address>' input.log |
grep -Po '(?<=from=<)[^>] ' > from_addresses.txt

(?<=from=<) is a "look behind" that requires that the text preceding the desired match is "from=<", then [^>] matches everything from there to the closing angle bracket
-o prints only that part of the line that matches

Assuming postfix logs have a fixed format where from immediately precedes to, then you can use awk with a custom input field separator:

awk -v e='email@address' -F' (to|from)=<|>' \
    '$4==e{print $2}' input.log > from_addresses.txt