Home > Back-end >  How to sort ip addresses in csv file using python3
How to sort ip addresses in csv file using python3

Time:10-13

I have a csv file that looks like this:

IP Address,Port,Protocol,State
192.168.3.1,53,tcp,open
192.168.13.100,80,tcp,open
192.168.3.1,443,tcp,close
192.168.3.71,1080,tcp,open
192.168.3.7,8888,tcp,open
192.168.23.12,80,tcp,filtered
192.168.3.12,443,tcp,open
192.168.3.12,631,tcp,open

How do I sort this by ip address, then by port number, in python 3?

I tried using this:

#!/bin/python3
# import modules 
import csv, ipaddress
  
data = csv.reader(open('list.csv'),delimiter=',')
  
data = sorted(data, key = ipaddress.IPv4Address)    
  
print('After sorting:')
print(data)

But I got a ipaddress.AddressValueError: Only decimal digits permitted in "['192" in "['192.168.3.1', '53', 'tcp', 'open']"

After sorting by ip address, the code should check the port next, since there is a possibility of same ip addresses but different ports.

Been trying to figure this out for over a week. Thanks.

CodePudding user response:

The first problem is that the data from your csv reader includes the header row. To skip the first line, just consume one line from the reader before doing anything else.

data = csv.reader(open('list.csv'),delimiter=',')
next(data) # Consumes the header line

data = sorted(...)

Side note: Use with so that the file is closed automatically when you exit the with block.

with open('list.csv') as file:
    data = csv.reader(file)
    next(data)
    data = sorted(...)

Now, the key argument takes a function and passes every element of the iterable you're sorting to that function. Now your data is an iterable where every element is a list representing each line of the csv file. You don't want to pass the entire list for each line, you only want to pass the first element of said list. You can use a lambda expression as the key to take every list, and pass only the first element to ipaddress.IPv4Address.

data = sorted(data, key = lambda row: ipaddress.IPv4Address(row[0]))

Since you also want to sort by port, you can have your lambda return a tuple containing the IP address and port number.

data = sorted(data, key = lambda row: (ipaddress.IPv4Address(row[0]), row[1]))

You might find it useful if you converted the first column to IPv4Address objects in data itself, so that you can use them elsewhere. In that case, read your csv file line-by-line and do that before sorting it.

with open('list.csv') as file:
    reader = csv.reader(file)
    next(reader)
    data = []
    for row in reader:
        row[0] = ipaddress.IPv4Address(row[0])
        data.append(row) 

    data.sort()

Here, you don't need to use a lambda function because list comparison automatically compares the first elements, then the second elements, and so on, and the elements of the rows are already the correct type for comparison.

  • Related