Time complexity of two separate nested for loops-CodePudding

An introductory textbook presents the following function as an example of polynomial complexity:

def intersect(L1, L2):
    """
    Assumes L1 and L2 are lists
    Returns a list without duplicates that is intersection of L1 and L2
    """
    
    # Part i - Build a list containing common elements
    tmp = []
    for e1 in L1:
        for e2 in L2:
            if e1 == e2:
                tmp.append(e1)
                break
             
    # Part ii - Build a list without duplicates
    result = []
    for e in tmp:
        if e not in result:
            result.append(e)
    
    return result

Questions

The author says that the running time for part (i) is order Θ(len(L1)*len(L2)), but this seems to imply that part (i) runs in Θ(len(L1)*len(L2)) in all cases, is this correct?

When I tried working through the example, I get different running times for the worst- and best-cases:

Worst-case: All of L1's elements match with the last element of L2 ==> Θ(len(L1)*len(L2)).
Best-case: All of L1's elements match with the first element of L2 ==> Θ(len(L1)).

...so I would have said part (i) is order Θ(len(L1)*len(L2)) in the worst-case.

The author then says that part (ii) is not Θ(len(tmp)) but rather Θ(len(tmp)*len(result)) since e not in result may require us to examine all elements in result and this would run in Θ(len(result)). But since len(result) and len(tmp) are bounded by the smaller of len(L1) and len(L2), Θ(len(tmp)*len(result)) can be ignored.

I understand that Θ(len(tmp)*len(result)) is an additive term and can therefore be ignored, but again I'm not sure why the author makes a blanket statement regarding part (ii) - are they saying part (ii) is Θ(len(tmp)*len(result)) in all cases?

Since the loop in part (ii) depends on the output of the loop in part (i), I figured result would be length 1 in the worst-case (as I've defined it above) and therefore part (ii)'s worst-case running time would be order Θ(len(tmp)), not Θ(len(tmp)*len(result)). This seems wrong, but I'm not sure how to characterise such loops.

I'm new to this topic so any guidance would be much appreciated.

EDIT: The same passage in an older edition of the book uses Big-O in place of Big-Theta.

CodePudding user response：

You are right. If L1 only contains the first element of L2 repeated, the first loops will run L1 times and tmp take length L1. Then the length of result will not exceed 1 and the last loops take time L1 as well.

But this does not contradict the book.

CodePudding user response：

The author says that the running time for part (i) is order Θ(len(L1)*len(L2)), but this seems to imply that part (i) runs in Θ(len(L1)*len(L2)) in all cases, is this correct?

No. The author uses Big Θ in the sense that they explain in section 11.2 of that same chapter, in the third edition (text differs from second edition) -- note the use of "worst case" which I highlighted:

Many computer scientists will abuse Big O notation by making statements like, "the complexity of f(x) is O(x²)." By this they mean that in the worst case f will take no more than O(x²) steps to run. [...]

[...] we will use Big Theta (Θ) when we are describing something that is both an upper and a lower bound on the asymptotic worst-case running time.

Then your following thought is also explained:

...so I would have said part (i) is order Θ(len(L1)*len(L2)) in the worst-case.

You'll now understand that this is exactly what the author wants to say.

Since the loop in part (ii) depends on the output of the loop in part (i), I figured result would be length 1 in the worst-case (as I've defined it above) and therefore part (ii)'s worst-case running time would be order Θ(len(tmp)), not Θ(len(tmp)*len(result)).

It is true that you defined a worst case earlier on, but there are other instances of worst case behaviour for part (i), that also perform worst in part (ii).

Take for instance the case where L1 has