Home > Enterprise >  Count how many times an ip has accessed a url
Count how many times an ip has accessed a url

Time:11-16

I can print the ip and url from a massive log file, but I need to list how many times an ip has visited that url. I have done some research about throwing the log in a database, but I specifically need to do all of this in Python. any help is very appreciated.

My Code so far:

#!/usr/bin/python3
count = 0
log = open("access.log-20201019", "r")
arr = []
frequency_array = []

for i in log.readlines():
        ip = i[0:14]
        ip2 = ip.split(' ')
        ip3 = ip2[0]
        #print(ip3)
        url =i[53:87]
        url2 = url.split()
        url3 = url2[0]
        print(ip3,url3)

Snippet of Log file:

66.177.237.17 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/1/latest.jpeg HTTP/2.0" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" "-"
158.136.64.65 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/rwis/littlebay/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/rwis/littlebay/latest.jpeg HTTP/1.1" 200 37145 "-" "curl/7.46.0" "-"
112.198.71.230 - - [18/Oct/2020:03:06:09 -0400] "GET /precip/raingauge2.gif HTTP/2.0" 200 10078 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" "-"
173.9.45.97 - - [18/Oct/2020:03:06:10 -0400] "GET /NHPR/NHPR_rad_an.gif HTTP/2.0" 200 587317 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" "-"
173.9.45.97 - - [18/Oct/2020:03:06:11 -0400] "GET /favicon.ico HTTP/2.0" 200 27877 "https://vortex.plymouth.edu/NHPR/NHPR_rad_an.gif" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" "-"
158.136.64.65 - - [18/Oct/2020:03:06:11 -0400] "GET /webcam/1/nograph.1.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:11 -0400] "GET /webcam/1/nograph.1.jpeg HTTP/1.1" 200 242804 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:12 -0400] "GET /webcam/rwis/echolake/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:12 -0400] "GET /webcam/rwis/echolake/latest.jpeg HTTP/1.1" 404 2256 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:14 -0400] "GET /webcam/rwis/lafeyette/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:14 -0400] "GET /webcam/rwis/lafeyette/latest.jpeg HTTP/1.1" 200 36974 "-" "curl/7.46.0" "-"

I am able to run my current code, but will output the ip and url multiple times for the same ip. I just want the number of times an ip visited a certain url.

CodePudding user response:

I would use a python dictionary, use the IP address as a key. Check if the ip exists in the keys list everytime I come across an IP. It will be something like:

if k in dict.keys(): dict[k] = 1

else: dict[k] = 1

Or you could make use of dict.setdefault() and set all to 1 and add everytime you find an IP

CodePudding user response:

I hope I've understood your question right:

text = """\
66.177.237.17 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/1/latest.jpeg HTTP/2.0" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" "-"
158.136.64.65 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/rwis/littlebay/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/rwis/littlebay/latest.jpeg HTTP/1.1" 200 37145 "-" "curl/7.46.0" "-"
112.198.71.230 - - [18/Oct/2020:03:06:09 -0400] "GET /precip/raingauge2.gif HTTP/2.0" 200 10078 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" "-"
173.9.45.97 - - [18/Oct/2020:03:06:10 -0400] "GET /NHPR/NHPR_rad_an.gif HTTP/2.0" 200 587317 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" "-"
173.9.45.97 - - [18/Oct/2020:03:06:11 -0400] "GET /favicon.ico HTTP/2.0" 200 27877 "https://vortex.plymouth.edu/NHPR/NHPR_rad_an.gif" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" "-"
158.136.64.65 - - [18/Oct/2020:03:06:11 -0400] "GET /webcam/1/nograph.1.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:11 -0400] "GET /webcam/1/nograph.1.jpeg HTTP/1.1" 200 242804 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:12 -0400] "GET /webcam/rwis/echolake/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:12 -0400] "GET /webcam/rwis/echolake/latest.jpeg HTTP/1.1" 404 2256 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:14 -0400] "GET /webcam/rwis/lafeyette/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:14 -0400] "GET /webcam/rwis/lafeyette/latest.jpeg HTTP/1.1" 200 36974 "-" "curl/7.46.0" "-"
"""

import re
from collections import Counter

pat = re.compile(r"([\d.] ).*?(?:GET|POST|PUT|PATCH|DELETE|OPTIONS|HEAD) (\S )")

cnt = Counter()
for line in text.splitlines():
    m = pat.match(line)
    if m:
        cnt.update([m.groups()])

for (ip, url), how_many_times in cnt.items():
    print(f"{ip} has visited [{url}] {how_many_times} time(s)")

Prints:

66.177.237.17 has visited [/webcam/1/latest.jpeg] 1 time(s)
158.136.64.65 has visited [/webcam/rwis/littlebay/latest.jpeg] 2 time(s)
112.198.71.230 has visited [/precip/raingauge2.gif] 1 time(s)
173.9.45.97 has visited [/NHPR/NHPR_rad_an.gif] 1 time(s)
173.9.45.97 has visited [/favicon.ico] 1 time(s)
158.136.64.65 has visited [/webcam/1/nograph.1.jpeg] 2 time(s)
158.136.64.65 has visited [/webcam/rwis/echolake/latest.jpeg] 2 time(s)
158.136.64.65 has visited [/webcam/rwis/lafeyette/latest.jpeg] 2 time(s)

EDIT: To read data from file you can try:

import re
from collections import Counter

pat = re.compile(r"([\d.] ).*?(?:GET|POST|PUT|PATCH|DELETE|OPTIONS|HEAD) (\S )")

cnt = Counter()
with open("your_log_file.txt", "r") as f_in:
    for line in f_in:
        m = pat.match(line)
        if m:
            cnt.update([m.groups()])

for (ip, url), how_many_times in cnt.items():
    print(f"{ip} has visited [{url}] {how_many_times} time(s)")
  • Related