Can we reduce the time complexity here?-CodePudding

I have an AoC problem where I have been given the data below:

data = """2-4,6-8
    2-3,4-5
    5-7,7-9
    2-8,3-7
    6-6,4-6
    2-6,4-8"""

I need to find the number of pairs which fully contain another pair. For example, 2-8 fully contains 3-7, and 6-6 is fully contained by 4-6.

I have solved it using the below code:

 def aoc_part1(self, data):
        counter = 0
        for lines_data in data.splitlines():
            lines_data = lines_data.strip()
            first_range, second_range = self.__get_first_second_list_of_elements(lines_data)
            check_first_side_if_returns_true = all(item in first_range for item in second_range)
            check_second_side_if_returns_true = all(item in second_range for item in first_range)
            if check_first_side_if_returns_true or check_second_side_if_returns_true:
                counter  = 1
        return counter
def __get_first_second_list_of_elements(self, data):
        first_elf, second_elf = data.split(",")[0], data.split(",")[1]
        first_range_start, first_range_end = map(int, first_elf.split("-"))
        second_range_start, second_range_end = map(int, second_elf.split("-"))
        first_range = list(range(first_range_start, first_range_end   1))
        second_range = list(range(second_range_start, second_range_end   1))
        return first_range, second_range

I was just wondering about the time complexity here. I think it should be a brute force here because for every iteration all will run another loop. How can I optimize this solution in order to get linear time complexity?

first_range and second_range are of int types. check_first_side_if_returns_true and check_second_side_if_returns_true are the boolean variables that check if the list is entirely contained or not. Based on that, it returns True or False.

CodePudding user response：

Your solution looks pretty complicated. Why not do something like:

data = """2-4,6-8
2-3,4-5
5-7,7-9
2-8,3-7
6-6,4-6
2-6,4-8
"""

def included(line):
    (a1, b1), (a2, b2) = (map(int, pair.split("-")) for pair in line.strip().split(","))
    return (a1 <= a2 and b2 <= b1) or (a2 <= a1 and b1 <= b2)

print(sum(included(line) for line in data.splitlines()))

I did some timing with my AoC-input for day 4 (1,000 lines):

from timeit import timeit

# Extract the interval boundaries for the pairs
boundaries = [
    [tuple(map(int, pair.split("-"))) for pair in line.strip().split(",")]
    for line in data.splitlines()
]

# Version 1 with simple comparison of boundaries
def test1(boundaries):
    def included(pairs):
        (a1, b1), (a2, b2) = pairs
        return (a1 <= a2 and b2 <= b1) or (a2 <= a1 and b1 <= b2)
    
    return sum(included(pairs) for pairs in boundaries)

# Version 2 with range-subset test
def test2(boundaries):
    def included(pairs):
        (a1, b1), (a2, b2) = pairs
        numbers1, numbers2 = set(range(a1, b1   1)), set(range(a2, b2   1))
        return numbers1 <= numbers2 or numbers2 <= numbers1
        
    return sum(included(pairs) for pairs in boundaries)

# Test for identical result
print(test1(boundaries) == test2(boundaries))

# Timing
for i in 1, 2:
    t = timeit(f"test{i}(boundaries)", globals=globals(), number=1_000)
    print(f"Duration version {i}: {t:.1f} seconds")

Result here, on a mediocre machine (repl.it):

Duration version 1: 0.4 seconds
Duration version 2: 5.4 seconds

CodePudding user response：

You're prob. making it overcomplicated in the current approach. If you split the pairs to two sets - eg. a, and b then you could easily do a set ops. to check if there is overlapping. That should be faster than yours.

Something like this one-line:

   # some input reading, and split to a, b sets.
   # count = 0

   if set(range(a, b   1)) & set(range(x, y   1)):
      count  = 1                # that's part1 answer.

There are questions about the memory efficiency about this approach earlier, I've run some profiling and this is the result to share - it's confirmed there should be NO problem given this puzzle's input size.

Filename: day04.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    27   43.758 MiB   43.758 MiB           1   @profile
    28                                         def part2(file):
    29   43.762 MiB    0.004 MiB           1       ans = 0
    30
    31   43.770 MiB    0.000 MiB        1001       for line in open(file):
    32   43.770 MiB    0.004 MiB        1000           a, b, x, y = map(int, line.replace(",", "-").split("-"))
    33   43.770 MiB    0.000 MiB        1000           if set(range(a, b   1)) & set(range(x, y   1)):
    34   43.770 MiB    0.004 MiB         847               ans  = 1
    35
    36   43.770 MiB    0.000 MiB           1       return ans