I have a csv file that looks like this:
IP Address,Port,Protocol,State
192.168.3.1,53,tcp,open
192.168.13.100,80,tcp,open
192.168.3.1,443,tcp,close
192.168.3.71,1080,tcp,open
192.168.3.7,8888,tcp,open
192.168.23.12,80,tcp,filtered
192.168.3.12,443,tcp,open
192.168.3.12,631,tcp,open
How do I sort this by ip address, then by port number, in python 3?
I tried using this:
#!/bin/python3
# import modules
import csv, ipaddress
data = csv.reader(open('list.csv'),delimiter=',')
data = sorted(data, key = ipaddress.IPv4Address)
print('After sorting:')
print(data)
But I got a ipaddress.AddressValueError: Only decimal digits permitted in "['192" in "['192.168.3.1', '53', 'tcp', 'open']"
After sorting by ip address, the code should check the port next, since there is a possibility of same ip addresses but different ports.
Been trying to figure this out for over a week. Thanks.
CodePudding user response:
The first problem is that the data from your csv reader includes the header row. To skip the first line, just consume one line from the reader before doing anything else.
data = csv.reader(open('list.csv'),delimiter=',')
next(data) # Consumes the header line
data = sorted(...)
Side note: Use with
so that the file is closed automatically when you exit the with
block.
with open('list.csv') as file:
data = csv.reader(file)
next(data)
data = sorted(...)
Now, the key
argument takes a function and passes every element of the iterable you're sorting to that function. Now your data
is an iterable where every element is a list representing each line of the csv file. You don't want to pass the entire list for each line, you only want to pass the first element of said list. You can use a lambda expression as the key to take every list, and pass only the first element to ipaddress.IPv4Address
.
data = sorted(data, key = lambda row: ipaddress.IPv4Address(row[0]))
Since you also want to sort by port, you can have your lambda return a tuple containing the IP address and port number.
data = sorted(data, key = lambda row: (ipaddress.IPv4Address(row[0]), row[1]))
You might find it useful if you converted the first column to IPv4Address
objects in data
itself, so that you can use them elsewhere. In that case, read your csv file line-by-line and do that before sorting it.
with open('list.csv') as file:
reader = csv.reader(file)
next(reader)
data = []
for row in reader:
row[0] = ipaddress.IPv4Address(row[0])
data.append(row)
data.sort()
Here, you don't need to use a lambda function because list comparison automatically compares the first elements, then the second elements, and so on, and the elements of the rows are already the correct type for comparison.