I am currently trying to write a program that will extract the most occurring ip address from a txt-CodePudding

example of information in file: 172.16.121.170 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437

the file contains a list of information as shown in example. I have to read the file into python and extract the ip and display the most occurring one and how many times it occurs.

from statistics import mode

def getinput():
    
    d = {}
    file = open("sample1.txt")
    for x in file:
        f = x.split(" ")
        d.update({f[0].strip(): f[0].strip()})

    return d

def counter(d):
    count = mode(d)
    occurences = 0

    for i  in d:
        if i == mode:
            occurences = occurences   1


    return count,occurences


def display(count,occurences):
    print(count)
    print(occurences)

    
d=getinput()
count,occurences=counter(d)
display(count,occurences)

this is what I have done so far, however using mode it only displays the first IP in the list and the occurrences doesn't seem to count as it is only displaying "0".

CodePudding user response：

Python offers a counter already Counter

You could try to use an iterator, to avoid having to create an intermediate datastructure, this helps specially if there are many repeated values.

import re
from collections import Counter
def get_ips(fname):
    // a pattern to match IPv4
    ip_re = re.compile('^\s*(\d \.\d \.\d \.\d )')
    with open(fname) as file
      for x in file:
        # extract the IP from the line
        # ignore if it does not have an IP
        ip_match = ip_re.search()
        if ip_match is not None
          # group(1) is the pattern in parethesis, the ip.
          yield ip_match.group(1)

ips = Counter(get_ips("sample1.txt"))
ips.most_frequent(10) # gets the 10 more frequent IPs

CodePudding user response：

You could do something like this:

Use regex to search for IP addresses in text file and append to ip_list
Identify the unique IP addresses
Calculate the number of times each IP address is found
Display the results

Code:

import re

ip_list = []
with open('sample1.txt') as f:
    for line in f.readlines():
        ip_list.append(re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', line).group())

# Get unique items in the ip_list
unique_ip_list = set(ip_list)

ip_counts = {}
# Find out the the counts for each IP address:
for ip in unique_ip_list:
    ip_counts[ip] = ip_list.count(ip)
    
print(ip_counts)
print()
print(f"Most common IP address: {max(ip_counts, key=ip_counts.get)} with {max(ip_counts.values())} times")

OUTPUT:

{'172.16.121.170': 2, '172.16.121.172': 3, '172.16.121.171': 1}

Most common IP address: 172.16.121.172 with 3 times

Tested with the following file:

Sample1.txt

172.16.121.170 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.170 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.172 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.172 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.172 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.171 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437

CodePudding user response：

Here is a test input file just listing the IP addresses. You will need to strip lines of logs or whatever to get a listing of just the IP addresses.

text.txt:

    192.168.0.34
    192.168.0.13
    192.168.0.45
    192.168.0.34
    192.168.0.62
    192.168.0.34
    192.168.0.13
    192.168.0.13
    192.168.0.62
    192.168.0.13
    192.168.0.45
    192.168.0.62
    192.168.0.45
    192.168.0.13
    192.168.0.65
    192.168.0.45
    192.168.0.10
    192.168.0.45
    192.168.0.7
    192.168.0.45
    192.168.0.92
    192.168.0.45
    192.168.0.12
    192.168.0.45
    192.168.0.14
    192.168.0.45
    192.168.0.32

Here is the Python code with comments:

    from collections import OrderedDict
    
    ip_occurrences = OrderedDict()
    # open your file
    with open('text.txt', 'r') as f:
    
       # read all the IP addresses in to a set
       ip_addresses = f.readlines()
    
       # loop through the set of IP addresses
       for ip in ip_addresses:
    
          # my text file had \n codes that needed to be filtered
          #  we will need to remember this when we reference ip_addresses
          #  for lookups as it is not filtered
          clean_ip = ip.replace('\n', '')
    
          # We check our clean IP address to see if it already exists.  
          # We only want to add new ip addresses
          if clean_ip not in ip_occurrences.keys():
    
             # create a new key with the clean_ip name with the count of 
             # the occurnces (not clean) in ip_addresses
             ip_occurrences[clean_ip] = ip_addresses.count(ip)
    
    # winner winner chicken dinner! this is the IP address that occurred the most
    most_freq_ip = max(ip_occurrences, key=ip_occurrences.get)
    
    # display it however you see fit.  I added ip_occurrences[most_freq_ip] to
    # show what the count is
    print( f'{most_freq_ip} occurred {ip_occurrences[most_freq_ip]} times')

Produces this output:

192.168.0.45 occurred 9 times