Home > Software design >  Linux count how many mails a certain domain sends per date
Linux count how many mails a certain domain sends per date

Time:03-07

I have some logs of a Linux SMTP server which uses the postfix agent. I want to perform an operation on the logs so I can know how many mails a certain domain sends per date without writing a script.

For example my mail.log file has these contents:

Jan  1 14:05:31 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 874BE4587C4)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 98C484E1571)
Jan  2 10:08:15 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4456D154E12)
Jan  2 15:07:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4F54515C154)
Jan  2 14:59:11 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 9856C984E16)
Feb  1 13:14:35 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as EC1874415E8)

The output I want is:

- First the domain/address the mail is sent from

- Amount of mails that specific domain sends per date (e.g. Jan 1 2 mails sent)

So here the output should be somehow:

http://mail.example.org[127.0.0.1]:25
Jan 1 2
Jan 2 1
Feb 1 1

http://mail.example2.org[127.0.0.1]:25
Jan 1 1
Jan 2 2

For now I know I have 2 commands that can do these operations seperately, but I really have no idea on how to combine them together:

1. Count how many mails a certain domain sends in total:

[user@linux ~] grep -h "status=sent" mail.log | cut -d' ' -f9 | awk '{c[$0]  = 1} END {for(i in c){printf "%6s M\n", i, c[i]}}' | sort -M

relay=http://mail.example2.org[127.0.0.1]:25,    3
relay=http://mail.example.org[127.0.0.1]:25,    4

2. Count how many mails are sent per day

[user@linux ~]$ grep -h "status=sent" mail.log | cut -c-6 | awk '{c[$0]  = 1} END {for(i in c){printf "%6s M\n", i, c[i]}}' | sort -k2

Feb  1    1
Jan  1    3
Jan  2    3

Does anyone know a good command that can help me with this specific operation? Any help would be appreciated, thank you!

CodePudding user response:

With your shown samples, please try following awk code. Written and tested in GNU awk should work with any version.

awk '
{
  gsub(/^relay=|,$/,"",$8)
}
{
  arr1[$1 OFS $2 OFS $8]  
}
END{
  for(i in arr1){
    split(i,arr2)
    arr3[arr2[3]]=(arr3[arr2[3]]?arr3[arr2[3]] ORS:"") (arr2[1] OFS arr2[2] OFS arr2[4] OFS arr1[i])
  }
  for(i in arr3){
    print i ORS arr3[i]
  }
}
'  Input_file

Explanation: In main program of awk firstly globally substituting starting relay= AND ending , with NULL in 7th field. Then creating an array named arr1 which has index as $1 OFS $2 OFS $8 and keep increasing its count with same indexes with 1 here, doing this for all the lines for Input_file. Then in END block of awk code, traversing through arr1 all elements and splitting its index i into arr2. Then creating new array arr3 with index of arr2's 3 element which is http value in Input_file. And assigning value to arr2[1] OFS arr2[2] OFS arr2[4] OFS arr1[i]. Once in all cycles arr3 is created, then traversing through all of its items by for loop and printing its index followed by ORS(new line) followed by value of arr3(which is responsible for printing needed required output).

CodePudding user response:

Assumptions:

  • a line can have at most one instance of the string relay=
  • relay= may not always show up in the same spaced-delimited field
  • output for a given domain/address should be in calendar order (which in this case should also be the order in which dates are read from mail.log)

Adding a couple lines that do not include relay=:

$ cat mail.log
Jan  1 14:05:31 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  1 14:17:27 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=rejected (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 874BE4587C4)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 98C484E1571)
Jan  2 10:08:15 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4456D154E12)
Jan  2 12:13:31 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=rejected (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  2 15:07:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4F54515C154)
Jan  2 14:59:11 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 9856C984E16)
Feb  1 13:14:35 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as EC1874415E8)

One idea using GNU awk (for array of arrays):

awk '
BEGIN         { regex = "\\<relay=[^, ]" }

/status=sent/ { date=$1 FS $2
                addr=""

                for (i=3;i<=NF;i  ) {                     # loop through fields looking for "relay="
                    if ($i ~ regex) {                     # and if found then parse out the domain/address
                       split($i,arr,"=")
                       addr=arr[2]
                       gsub(",","",addr)
                       continue
                    }
                }

                if (addr != "") {                         # if we found an address then increment our counter
                   counts[addr][date]  

                   if (date != prevdate) {                # and keep track of the order in which dates have been processed
                      dates[  dtorder]=date
                      prevdate=date
                   }
                }
              }

END           { for (addr in counts) {
                    print addr

                    for (i=1;i<=dtorder;i  )              # loop through dates[] in the same order in which they were processed
                        if (dates[i] in counts[addr])
                           print dates[i],counts[addr][dates[i]]
                }
              }
' mail.log

NOTES:

  • for (addr in counts) is not guaranteed to process array entries in any specific order
  • dates[ dtorder]=date is used to keep track of the order in which dates are processed; this is then used in the END{...} processing to insure we ouput dates in the same order; this assumes dates show up in mail.log in calendar order which in turn eliminates the need to figure out how to sort Jan, Feb, Mar, etc in calendar order

This generates:

http://mail.example2.org[127.0.0.1]:25
Jan 1 1
Jan 2 2
http://mail.example.org[127.0.0.1]:25
Jan 1 2
Jan 2 1
Feb 1 1
  • Related