I have a 100,000 entry list of lists derived from a CSV that is derived from a firewall log. My vision is to end with a file that outputs the ports used between any two IPs, such as: 1.1.1.1. to 2.2.2.2 ports (25, 53, 80) 2.2.2.2 to 1.1.1.1 ports (443, 123)
So far I have been able to read the file into a list then create a list of source ips and a list of destination ips. I can then manually get the ports associated with the manually inputted IPs. However, there are 4 sources and 67 destination. I do not want to run this manually 268 times. My problem is I want to somehow iterate over the list of list checking the source and destination and then adding those ports. My idea is to do a for loop over the lst object, then looping over the source list and destination list, then collecting the ports. I'm not sure if this can be done and if I'm doing it correctly at all.
I know there are some formatting issues and better ways to do some of this, rather newish.
Sample of the log (will strip srcport= later as its not critical at this time to do so):
1.1.1.1,2.2.2.2,srcport=58084,dstport=161,proto=17,service=SNMP
5.5.5.5,2.2.2.2,srcport=58082,dstport=123,proto=17,service=NTP
1.1.1.1,3.3.3.3,srcport=59089,dstport=123,proto=17,service=NTP
6.6.6.6,3.3.3.3,srcport=41376,dstport=123,proto=17,service=NTP
1.1.1.1,4.4.4.4,srcport=53546,dstport=22,proto=6,app=SSH
#! python3
import csv
#Read the file and covert the CSV to a usable list format
csv_filename = 'data-for-csv-reader_no_parentheses_v3.csv'
#need to add operation to strip " from lines
#open file and read into a list of lists:
with open(csv_filename) as f:
reader =csv.reader(f)
lst = list(reader)
#Extract all Source IPs:
srcips =[]
for item in lst:
srcips.append(item[0])
#Deduplicate source IPS:
srciplist = [*set(srcips)]
print("The number of source IPs is " str(len(srciplist)) ".")
#Strip 'srcip= off of entry (no longer needed, pherhaps)
srciplist_stripped = [j.strip('srcip=') for j in srciplist]
srciplist_stripped.sort()
print(srciplist_stripped)
#Extract all destination IPs:
dstips =[]
for item in lst:
dstips.append(item[1])
#Deduplicate destination IPs:
dstiplist = [*set(dstips)]
print("The number of destination IPs is " str(len(dstiplist)) ".")
#Strip 'dstip= off of entry (no longer needed, pherhaps)
dstiplist_stripped = [j.strip('dstip=') for j in dstiplist]
dstiplist_stripped.sort()
print(dstiplist_stripped)
#Manual operation to get one source and one destination's ports:
port_list = []
for item in lst:
if item[0] == srciplist_stripped[2] and item[1] == dstiplist_stripped[4]:
port_list.append(item[3])
#Presents port list for the prior two IPs
port_list = [*set(port_list)]
print("Source IP:" str(srciplist_stripped[2]) " Destination IP:" str(dstiplist_stripped[4]) " Port_list :" str(port_list))
print("The number of ports is " str(len(port_list)) ".")
The code won't work unless you run it against a csv file. As written it gets me the following (edited for IPs):
The number of source IPs is 4.
['1.1.1.1', '2.2.2.2', '3.3.3.3', '4.4.4.4']
The number of destination IPs is 67.
['7.7.7.7', '6.6.6.6', '5.5.5.5', <--omitted for brevity-->]
Source IP:1.1.1.1. Destination IP:2.2.2.2 Port_list :['dstport=644', 'dstport=1039',<--omitted for brevity-->]
The number of ports is 873.
(IPs are faked so they don't line up with the indexes as presented in the sample firewall log)
I want it to output this:
Source IP:1.1.1.1. Destination IP:2.2.2.2 Port_list :['dstport=644', 'dstport=1039',<--omitted for brevity-->]
The number of ports is 873.
but for each ip address combination, which will then be written to a file. Final output would be what is posted above between 4 sources and 67 destinations, so 268 entries (many of which will be blank in the Port_list and list 0 for number of ports).
CodePudding user response:
I hope I've understood your question right.
You can load the source IPs to a dictionary (where keys are source IPs, values are dictionaries in format {destination IP: [list of ports]}).
import csv
from itertools import product
src, dst = {}, set()
with open("data.csv", "r") as f_in:
reader = csv.reader(f_in)
for row in reader:
src.setdefault(row[0], {}).setdefault(row[1], []).append(
row[3].split("=")[-1]
)
dst.add(row[1])
for s, d in product(src, dst):
print(f"Source IP: {s} Destination IP: {d} Port_list: {src[s].get(d, [])}")
Prints:
Source IP: 1.1.1.1 Destination IP: 3.3.3.3 Port_list: ['123']
Source IP: 1.1.1.1 Destination IP: 4.4.4.4 Port_list: ['22']
Source IP: 1.1.1.1 Destination IP: 2.2.2.2 Port_list: ['161', '123']
Source IP: 5.5.5.5 Destination IP: 3.3.3.3 Port_list: []
Source IP: 5.5.5.5 Destination IP: 4.4.4.4 Port_list: []
Source IP: 5.5.5.5 Destination IP: 2.2.2.2 Port_list: ['123']
Source IP: 6.6.6.6 Destination IP: 3.3.3.3 Port_list: ['123']
Source IP: 6.6.6.6 Destination IP: 4.4.4.4 Port_list: []
Source IP: 6.6.6.6 Destination IP: 2.2.2.2 Port_list: []
Data used in data.csv
:
1.1.1.1,2.2.2.2,srcport=58084,dstport=161,proto=17,service=SNMP
1.1.1.1,2.2.2.2,srcport=58084,dstport=123,proto=17,service=SNMP
5.5.5.5,2.2.2.2,srcport=58082,dstport=123,proto=17,service=NTP
1.1.1.1,3.3.3.3,srcport=59089,dstport=123,proto=17,service=NTP
6.6.6.6,3.3.3.3,srcport=41376,dstport=123,proto=17,service=NTP
1.1.1.1,4.4.4.4,srcport=53546,dstport=22,proto=6,app=SSH
CodePudding user response:
Since you are already using the CSV module, you can use a DictReader
to use column names, which is more flexible than indexes. And then apply set()
on list comprehensions to count unique values. Example:
import csv
fieldnames = ['src_ip', 'dst_ip', 'src_port', 'dst_port', 'proto', 'service']
with open('firewall.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',', fieldnames=fieldnames)
lines = list(reader)
unique_src_ips = set([line["src_ip"] for line in lines])
print(f"The number of source IPs is {len(unique_src_ips)}.")
unique_dst_ips = set([line["dst_ip"] for line in lines])
print(f"The number of destination IPs is {len(unique_dst_ips)}.")