Home > Software engineering >  Demonstration of sets being faster than arrays?
Demonstration of sets being faster than arrays?

Time:10-18

The goal is to find the intersection of 2 arrays. I know you're supposed to use sets to get a faster runtime. However, in my notebook I was surprised to find that my runtime is not faster with the sets method. (89 ms (array) vs. 220 ms (sets)). Is there something wrong with my code or something I'm misunderstanding?

A = range(50,1000000)
B = range(-1000000,60)

%%time
# O(m*n)?
def intersection(A, B):
  array = []
  for i in A:
    if i in B:
      array.append(i)
  return array

intersection(A,B)

%%time
# O(1)?
def intersection(A, B):
  set_a = set(A)
  set_b = set(B)
  return [i for i in set_a if i in set_b]
  # return set_a.intersection(set_b)

intersection(A,B)

CodePudding user response:

You are not comparing arrays to sets.

You are comparing ranges to sets.

Checking the membership in a range is almost instantaneous (two comparisons: one with the lower, one with the upper bound).

Checking the membership in an array would take millions of iterations in this case, which would be substantially longer than membership checks for sets.

If you convert the ranges to lists first, you see a huge difference:

import time

A = range(50,100000)
B = range(-100000,60)

def intersectionArrays(A, B):
  array = []
  arr_a = list(A)
  arr_b = list(B)
  for i in arr_a:
    if i in arr_b:
      array.append(i)
  return array


def intersectionSets(A, B):
  set_a = set(A)
  set_b = set(B)
  return [i for i in set_a if i in set_b]
  # return set_a.intersection(set_b)



startArr = time.process_time()
intersectionArrays(A,B)
endArr = time.process_time()

startSet = time.process_time()
intersectionSets(A,B)
endSet = time.process_time()

print(endArr - startArr)
print(endSet - startSet)

It prints something like

Arrays: 39.871636528
Sets:   0.008161553000000765

There is no point trying it with 1000000 - it would take hours.

Furthermore, in order to measure something vaguely realistic, you should shuffle the numbers, otherwise branch prediction might distort the results.

CodePudding user response:

I think the reason you're seeing this slowdown is because the set based intersection function is converting all of the data into a set, and this extra conversion step takes some time. When I do the conversion first, I get a faster time with the set than the list (23 ms vs 36 ms)

  • Related