An introductory textbook presents the following function as an example of polynomial complexity:
def intersect(L1, L2):
"""
Assumes L1 and L2 are lists
Returns a list without duplicates that is intersection of L1 and L2
"""
# Part i - Build a list containing common elements
tmp = []
for e1 in L1:
for e2 in L2:
if e1 == e2:
tmp.append(e1)
break
# Part ii - Build a list without duplicates
result = []
for e in tmp:
if e not in result:
result.append(e)
return result
Questions
- The author says that the running time for part (i) is order
Θ(len(L1)*len(L2))
, but this seems to imply that part (i) runs inΘ(len(L1)*len(L2))
in all cases, is this correct?
When I tried working through the example, I get different running times for the worst- and best-cases:
- Worst-case: All of L1's elements match with the last element of L2 ==>
Θ(len(L1)*len(L2))
. - Best-case: All of L1's elements match with the first element of L2 ==>
Θ(len(L1))
.
...so I would have said part (i) is order Θ(len(L1)*len(L2))
in the worst-case.
- The author then says that part (ii) is not
Θ(len(tmp))
but ratherΘ(len(tmp)*len(result))
sincee not in result
may require us to examine all elements inresult
and this would run inΘ(len(result))
. But sincelen(result)
andlen(tmp)
are bounded by the smaller oflen(L1)
andlen(L2)
,Θ(len(tmp)*len(result))
can be ignored.
I understand that Θ(len(tmp)*len(result))
is an additive term and can therefore be ignored, but again I'm not sure why the author makes a blanket statement regarding part (ii) - are they saying part (ii) is Θ(len(tmp)*len(result))
in all cases?
Since the loop in part (ii) depends on the output of the loop in part (i), I figured result
would be length 1 in the worst-case (as I've defined it above) and therefore part (ii)'s worst-case running time would be order Θ(len(tmp))
, not Θ(len(tmp)*len(result))
. This seems wrong, but I'm not sure how to characterise such loops.
I'm new to this topic so any guidance would be much appreciated.
EDIT: The same passage in an older edition of the book uses Big-O in place of Big-Theta.
CodePudding user response:
You are right. If L1 only contains the first element of L2 repeated, the first loops will run L1 times and tmp take length L1. Then the length of result will not exceed 1 and the last loops take time L1 as well.
But this does not contradict the book.
CodePudding user response:
The author says that the running time for part (i) is order
Θ(len(L1)*len(L2))
, but this seems to imply that part (i) runs inΘ(len(L1)*len(L2))
in all cases, is this correct?
No. The author uses Big Θ in the sense that they explain in section 11.2 of that same chapter, in the third edition (text differs from second edition) -- note the use of "worst case" which I highlighted:
Many computer scientists will abuse Big O notation by making statements like, "the complexity of f(x) is O(x²)." By this they mean that in the worst case f will take no more than O(x²) steps to run. [...]
[...] we will use Big Theta (Θ) when we are describing something that is both an upper and a lower bound on the asymptotic worst-case running time.
Then your following thought is also explained:
...so I would have said part (i) is order
Θ(len(L1)*len(L2))
in the worst-case.
You'll now understand that this is exactly what the author wants to say.
Since the loop in part (ii) depends on the output of the loop in part (i), I figured
result
would be length 1 in the worst-case (as I've defined it above) and therefore part (ii)'s worst-case running time would be orderΘ(len(tmp))
, notΘ(len(tmp)*len(result))
.
It is true that you defined a worst case earlier on, but there are other instances of worst case behaviour for part (i), that also perform worst in part (ii).
Take for instance the case where L1
has