Home > Blockchain >  Run time difference for "in" searching through "list" and "set" using
Run time difference for "in" searching through "list" and "set" using

Time:05-06

My understanding of list and set in Python are mainly that list allows duplicates, list allows ordered information, and list has position information. I found while I was trying to search if an element is with in a list, the runtime is much faster if I convert the list to a set first. For example, I wrote a code trying find the longest consecutive sequence in a list. Use a list from 0 to 10000 as an example, the longest consecutive is 10001. While using a list:

    start_time = datetime.now()
    nums = list(range(10000))
    longest = 0
    for number in nums:
        if number - 1 not in nums:
            length = 0
            ##Search if number   1 also in the list##
            while number   length in nums: 
                length  = 1
            longest = max(length, longest)
    end_time = datetime.now()
    timecost = 'Duration: {}'.format(end_time - start_time)
    print(timecost)

The run time for above code is "Duration: 0:00:01.481939" With adding only one line to convert the list to set in third row below:

    start_time = datetime.now()
    nums = list(range(10000))
    nums = set(nums)
    longest = 0
    for number in nums:
        if number - 1 not in nums:
            length = 0
            ##Search if number   1 also in the set(was a list)##
            while number   length in nums:
                length  = 1
            longest = max(length, longest)
    end_time = datetime.now()
    timecost = 'Duration: {}'.format(end_time - start_time)
    print(timecost)

The run time for above code by using a set is now "Duration: 0:00:00.005138", Many time shorter than search through a list. Could anyone help me to understand the reason for that? Thank you!

CodePudding user response:

This is a great question.

The issue with arrays is that there is no smarter way to search in some array a besides just comparing every element one by one.

  • Sometimes you'll get lucky and get a match on the first element of a.
  • Sometimes you'll get unlucky and not find a match until the last element of a, or perhaps none at all.
  • On average, you'll have to search half the elements of they array each time.

This is said to have a "time complexity" of O(len(a)), or colloquially, O(n). This means the time taken by the algorithm (searching for a value in array) is linearly propertional to the size of the input (the number of elements in the array to be searched). This is why it's called "linear search". Oh, your array got 2x bigger? Well your searches just got 2x slower. 1000x bigger? 1000x slower.

Arrays are great, but they're

  • Related