Home > Back-end >  Count unique Ips from access.log by day and minute
Count unique Ips from access.log by day and minute

Time:09-16

I'm wondering if it is possible to count unique IPs by minute on a specific day (Apache access.log) on Ubuntu.

I already found this useful request which gives the requests per day/minute. But I unfortunatly dont make it to count the ips instead of the request lines:

grep "06/Sep/2021" access.log | cut -d[ -f2 | cut -d] -f1 |
awk -F: '{print $2":"$3}' | sort -nk1 -nk2 | uniq -c

My try is not that good:

grep "06/Sep/2021" access.log | awk '{print
substr($4,14,5)}' | sort | uniq | while read p; do   count=`grep $p
access.log | awk '{print $1}' | sort | uniq | wc
-l`   echo $count $p  done

Apache Access.log:

11.111.111.111 - - [06/Sep/2021:01:51:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:52:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:53:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:54:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:55:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:56:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:57:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:58:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.112 - - [06/Sep/2021:01:58:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:59:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:01:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:02:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:03:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:04:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:05:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:06:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:07:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:08:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:09:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:10:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
    11.111.111.112 - - [06/Sep/2021:02:10:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146

Expected Output:

1 01:51
1 01:52
1 01:53
1 01:54
1 01:55
1 01:56
1 01:57
2 01:58
1 01:59
1 02:01
1 02:02
1 02:03
1 02:04
1 02:05
1 02:06
1 02:07
1 02:08
1 02:09
2 02:10

CodePudding user response:

Here is a gnu-awk solution to do this in a single command:

awk -v dt="06/Sep/2021" '
$0 ~ dt && gsub(/^[^:] :|:[0-9] $/, "", $4) {   fq[$4] }
END {
   PROCINFO["sorted_in"]="@ind_str_asc"
   for (i in fq)
      print i, fq[i]
}' file.log

01:51 1
01:52 1
01:53 1
01:54 1
01:55 1
01:56 1
01:57 1
01:58 2
01:59 1
02:01 1
02:02 1
02:03 1
02:04 1
02:05 1
02:06 1
02:07 1
02:08 1
02:09 1
02:10 2

PROCINFO["sorted_in"]="@ind_str_asc" has been used to sorting keys in ascending string order.

CodePudding user response:

With your shown samples, please try following awk program.

awk -v dt="06/Sep/2021" '
$0 ~ dt && match($0,/\[[^ ]*/){
  arr[substr($0,RSTART 13,RLENGTH-16)]  
}
END{
  for(key in arr){
    print key,arr[key]
  }
}
'  Input_file | sort -k1

Explanation: Using awk program and parsing Input_file from it. Making awk variable named dt which has value as 06/Sep/2021. In Main program checking if line contains dt variable AND using match function to match regex from [ till space(which will basically get [06/Sep/2021:02:01:43). Creating arr array which has index as matched regex value in it. In END block of awk program traversing through elements of arr and printing key and its value. Sending its output to sort to get output in sorted form.

CodePudding user response:

Does this script work for you?

#!/usr/bin/env bash

path="/etc/httpd/logs/access_log"
m=01                                 

while [[ $m -lt 60 ]]; do
    awk -F"[-:[/]" -v OFS=":" -v m="$m" '$4==15{print $1" "$7,$8}' "$path" |\
    awk -v m="$m" '$0~"[0-9]{2}:"m' |\
    uniq -c |\
    awk '{$2=""}1'
    ((m  ));
done

CodePudding user response:

Assumptions:

  • ip address does not matter; while there is mention of 'unique ip' in the title and body of the question, the expected output has no mention of ip addresses and the expected output counts do not appear to be segregated by ip

Adding a few lines with different dates:

$ cat access.log
11.111.111.111 - - [03/Sep/2021:01:51:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [01/Sep/2021:01:52:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
... all of the lines from OP's sample input ...
11.111.111.112 - - [07/Sep/2021:02:10:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146
11.111.111.112 - - [10/Sep/2021:02:10:43  0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website;  https://www.website.de/robot.html)" 2584 32146

One awk idea that replaces all of the grep/awk/while/cut/sort/uniq coding (and runs quite a bit faster to boot):

awk -v dt='06/Sep/2021' '
$0 ~ dt { split($0,timestamp,"[][]")
          split(timestamp[2],hrmin,":")
          count[hrmin[2]":"hrmin[3]]  
        }
END     {for (i in count) 
             print count[i],i
        }
' access.log 

This generates:

1 01:51
1 01:52
1 01:53
1 01:54
1 01:55
1 01:56
1 01:57
2 01:58    # ip addresses 11.111.111.11{1,2}
1 01:59
1 02:01
1 02:02
1 02:03
1 02:04
1 02:05
1 02:06
1 02:07
1 02:08
1 02:09
2 02:10    # ip addresses 11.111.111.11{1,2}

NOTE: for this exercise the data is displayed in hh:mm order; if this code does not generate the correct order for some people, the ouput can be sorted by piping the awk output to sort -k2

  • Related