find if a number set has intersection with another set-CodePudding

background:

the service port number in firewall access-list has different format like:

single number: 22,443

range:10000-65535

I am working around some analysis to the firewall, need to find if a specified port reside in the access-list port numbers

for example:

the acl service port is String[] servicePorts = {"20-22","443","8080-8088","10000-65535"}

need to find if {"5672","15672-15674"} in the servicePorts

Plan1:

convert the servicePort into a Set, leave the rest works to Set Api

Plan2:

convert the ports into an unified format, each element has a "start-end" number like:

targetPorts = {"20-22","443-443","8080-8088","10000-65535"}

myPorts = {"5672-5672","15672-15674"}

then need to loop myPorts, check each elements against targetPorts elements with the comparation of start/end number.

Plan1 is more simple but is really heavy coz the ports set contains large elements.

I've considered there could have some binary operating method, but has no way out.

Any other plan to deal with the number intersection problem?

or which plan do you prefer.

CodePudding user response：

Let's do a quick analysis of the 3 approaches that have been mentioned so far.

For the analysis we're using the following variables:

number of acl ranges: n
number of ports in the acl: a (where a >= n)
number of service ranges: m
number of ports in the service: s (where s >= m)

Approach 1: Build set of individual ports

Development complexity: simple - add single values, iterate over ranges and either put values into the set or check for them
memory complexity: depends on the number of ports (up to 65k entries) - O(a s)
time complexity: assuming you're using a properly sized HashSet you'd get O(1) complexity for inserts and lookups, so O(a s) complexity

Note: if you build the acl set once complexity gets down to O(s) for each service.

Approach 2: Build sorted list/tree of acl ranges and check unsorted list of service ranges

Development complexity: simple/medium - convert single values to ranges, look for potential overlaps, check for actual overlaps
memory complexity: depends on the number of ranges but presumably considerably lower than number of ports - O(n m)
time complexity: O((m n) * log(n))
- build acl list: O(n * log(n))
- iterate over service port list: O(m)
- lookup and check for overlap in each service port range: O(m * log(n)) (could be more for freak situations but those aren't likely)

Note: if you build the acl list once and can reuse it time complexity gets down to O(m * log(n)) for each service.

Approach 3: Build single labelled list and count starts/ends (approach suggested by גלעד ברקן)

Development complexity: simple/medium - build labelled list, keep track of counts for both labels
memory complexity: depends on the number of ranges but presumably considerably lower than number of ports - O(n m)
time complexity: O((n m) * log(n m))
- iterate over n and m ranges for both lists: O(n m)
- sort the combined list: O((n m) * log(n m))
- iterate over the list once more to count: O(n m)

Note: you could build a sorted list for the acl once, copy it for each service and use insert sort etc. to add the service ranges. That would bring down time complexity to O(m * log(n m)) (O(m) for the iteration over m ranges and O(log(n m)) for the lookup of the insertion point of each range).

Conclusion

Given this analysis approach 2 would be preferable due to the lower time complexity (O(n m) * log(n)) as compared to O((n m)*log(n m))) and similar memory and development complexity (albeit the latter is more subjective).

For very small ranges approach 1 could be more feasible but that can turn very fast and approach 2 has the better general properties (i.e. for an unknown number of ranges with an unknown size).

CodePudding user response：

Plan 3, insert each start and end as a labeled tuple in a single sorted list

(t, start, 20), (t, end, 22), (t, start, 443), (t, end, 443), (m, start, 5672), (m, end, 5672), (t, start, 8080), (t, end, 8088), (t, start, 10000), (m, start, 15672)...

Traverse from small to large and keep a running count for each label by incrementing by 1 when encountering a start for that label, and decrementing by 1 when encountering an end for that label. An intersection is found if one count is greater than 0 when the other also becomes.