I was trying to create a command to allow me to see the top 10 hosts/IP (in access.log) via the Linux command line from 2019-07-10 00:00:00
up to and including 2019-07-19 23:59:59
. I got the first bit working IE the top 10 IPs using the following
awk '{ print $1}' access.log | sort | uniq -c | sort -nr | head -n 10
Just trying to work out how to apply the time duration as above. Any help would be great.
CodePudding user response:
The GNU version of awk, gawk, can be told to loop through an associated array in a particular order. Below, with val_num_desc, I ask awk to use the value as the sorting key, interpret it as a number, and sort in descending order.
gawk -F' ' 'BEGIN {PROCINFO["sorted_in"] = "@val_num_desc"} \
/2019-07-10 00:00:00/,/2019-07-19 23:59:59/ {hosts[$1] } \
END {for (host in hosts) {if (count == 10){exit 1} \
printf("%s %s\n", host,hosts[host])}}' \
access.log
The /FROM/,/TO/ pattern relies on 00:00:00 and 23:59:59 being present in the file.
Given a made-up file such as /tmp/access.log:
$ cat /tmp/access.log
192.168.1.2 2019-07-09 23:00:00
192.168.1.2 2019-07-09 23:41:00
192.168.1.2 2019-07-09 23:58:00
192.168.1.5 2019-07-10 00:00:00
192.168.1.1 2019-07-10 00:34:00
192.168.1.1 2019-07-10 00:42:00
192.168.1.2 2019-07-10 00:59:00
192.168.1.2 2019-07-10 01:02:00
192.168.1.2 2019-07-10 01:12:00
192.168.1.2 2019-07-10 01:00:00
192.168.1.3 2019-07-10 02:00:00
192.168.1.3 2019-07-10 03:00:00
192.168.1.3 2019-07-10 04:00:00
192.168.1.3 2019-07-10 05:00:00
192.168.1.1 2019-07-10 06:00:00
192.168.1.1 2019-07-19 01:00:00
192.168.1.6 2019-07-19 23:59:59
192.168.1.6 2019-07-20 02:00:00
192.168.1.6 2019-07-20 04:00:00
$ gawk -F' ' 'BEGIN {PROCINFO["sorted_in"] = "@val_num_desc"} \
/2019-07-10 00:00:00/,/2019-07-19 23:59:59/ {hosts[$1] } \
END {for (host in hosts) {if (count == 10){exit 1} \
printf("%s %s\n", host,hosts[host])}}' \
/tmp/access.log
192.168.1.3 4
192.168.1.2 4
192.168.1.1 4
192.168.1.6 1
192.168.1.5 1
$
Or using the initial commands of @Coopsre as a starting point, keeping lines in the inclusive range /from-pattern/,/to-pattern/:
$ awk '/2019-07-10 00:00:00/,/2019-07-19 23:59:59/ { print $1}' /tmp/access.log | sort | uniq -c | sort -nr | head -n 10
4 192.168.1.3
4 192.168.1.2
4 192.168.1.1
1 192.168.1.6
1 192.168.1.5
$