Home > Software engineering >  Top 10 Hosts/IP's within time duration
Top 10 Hosts/IP's within time duration

Time:03-03

I was trying to create a command to allow me to see the top 10 hosts/IP (in access.log) via the Linux command line from 2019-07-10 00:00:00 up to and including 2019-07-19 23:59:59. I got the first bit working IE the top 10 IPs using the following

awk '{ print $1}' access.log | sort | uniq -c | sort -nr | head -n 10

Just trying to work out how to apply the time duration as above. Any help would be great.

CodePudding user response:

The GNU version of awk, gawk, can be told to loop through an associated array in a particular order. Below, with val_num_desc, I ask awk to use the value as the sorting key, interpret it as a number, and sort in descending order.

gawk -F' ' 'BEGIN {PROCINFO["sorted_in"] = "@val_num_desc"} \
    /2019-07-10 00:00:00/,/2019-07-19 23:59:59/ {hosts[$1]  } \
    END {for (host in hosts) {if (count   == 10){exit 1} \
                              printf("%s %s\n", host,hosts[host])}}' \
access.log

The /FROM/,/TO/ pattern relies on 00:00:00 and 23:59:59 being present in the file.

Given a made-up file such as /tmp/access.log:

$ cat /tmp/access.log
192.168.1.2 2019-07-09 23:00:00
192.168.1.2 2019-07-09 23:41:00
192.168.1.2 2019-07-09 23:58:00
192.168.1.5 2019-07-10 00:00:00
192.168.1.1 2019-07-10 00:34:00
192.168.1.1 2019-07-10 00:42:00
192.168.1.2 2019-07-10 00:59:00
192.168.1.2 2019-07-10 01:02:00
192.168.1.2 2019-07-10 01:12:00
192.168.1.2 2019-07-10 01:00:00
192.168.1.3 2019-07-10 02:00:00
192.168.1.3 2019-07-10 03:00:00
192.168.1.3 2019-07-10 04:00:00
192.168.1.3 2019-07-10 05:00:00
192.168.1.1 2019-07-10 06:00:00
192.168.1.1 2019-07-19 01:00:00
192.168.1.6 2019-07-19 23:59:59
192.168.1.6 2019-07-20 02:00:00
192.168.1.6 2019-07-20 04:00:00
$ gawk -F' ' 'BEGIN {PROCINFO["sorted_in"] = "@val_num_desc"} \
    /2019-07-10 00:00:00/,/2019-07-19 23:59:59/ {hosts[$1]  } \
    END {for (host in hosts) {if (count   == 10){exit 1} \
                              printf("%s %s\n", host,hosts[host])}}' \
/tmp/access.log
192.168.1.3 4
192.168.1.2 4
192.168.1.1 4
192.168.1.6 1
192.168.1.5 1
$ 

Or using the initial commands of @Coopsre as a starting point, keeping lines in the inclusive range /from-pattern/,/to-pattern/:

$ awk '/2019-07-10 00:00:00/,/2019-07-19 23:59:59/ { print $1}' /tmp/access.log | sort | uniq -c | sort -nr | head -n 10
   4 192.168.1.3
   4 192.168.1.2
   4 192.168.1.1
   1 192.168.1.6
   1 192.168.1.5
$ 
  • Related