most efficient way to iterate over a large array looking for a missing element in Python-CodePudding

I was trying an online test. the test asked to write a function that given a list of up to 100000 integers whose range is 1 to 100000, would find the first missing integer.

for example, if the list is [1,4,5,2] the output should be 3.

I iterated over the list as follow

def find_missing(num)
    for i in range(1, 100001):
    if i not in num:
        return i

the feedback I receives is the code is not efficient in handling big lists. I am quite new and I couldnot find an answer, how can I iterate more efficiently?

CodePudding user response：

The first improvement would be to make yours linear by using a set for the repeated membership test:

def find_missing(nums)
    s = set(nums)
    for i in range(1, 100001):
        if i not in s:
            return i

Given how C-optimized python sorting is, you could also do sth like:

def find_missing(nums)
    s = sorted(set(nums))
    return next(i for i, n in enumerate(s, 1) if i != n)

But both of these are fairly space inefficient as they create a new collection. You can avoid that with an in-place sort:

from itertools import groupby

def find_missing(nums):
    nums.sort()  # in-place
    return next(i for i, (k, _) in enumerate(groupby(nums), 1) if i != k)

CodePudding user response：

You can use Counter to count every occurrence of your list. The minimum number with occurrence 0 will be your output. For example:

 from collections import Counter

 def find_missing():
    count = Counter(your_list)
    keys = count.keys() #list of every element in increasing order
    main_list = list(range(1:100000)) #the list of values from 1 to 100k
    missing_numbers = list(set(main_list) - set(keys))
    your_output = min(missing_numbers)
    return your_output

CodePudding user response：

For any range of numbers, the sum is given by Gauss's formula:

# sum of all numbers up to and including nums[-1] minus
# sum of all numbers up to but not including nums[-1]
expected = nums[-1] * (nums[-1]   1) // 2 - nums[0] * (nums[0] - 1) // 2

If a number is missing, the actual sum will be

actual = sum(nums)

The difference is the missing number:

result = expected - actual

This compulation is O(n), which is as efficient as you can get. expected is an O(1) computation, while actual has to actually add up the elements.

A somewhat slower but similar complexity approach would be to step along the sequence in lockstep with either a range or itertools.count:

for a, e in zip(nums, range(nums[0], len(nums)   nums[0])):
    if a != e:
        return e  # or break if not in a function

Notice the difference between a single comparison a != e, vs a linear containment check like e in nums, which has to iterate on average through half of nums to get the answer.

CodePudding user response：

This works without numpy:

def find_missing(a_list):
    missing_list = []
    a_list.sort()
    minum = min(a_list)
    maxnum = max(a_list)
    for i in range(minum, maxnum 1):
        if i not in a_list:
            missing_list.append(i)
    return missing_list

My_list = [4,2,1,5,6,7,8,11,10]
missingIntList = find_missing(My_list)
if missingIntList:
    for num in missingIntList:
        print(f"{num} is missing")
else:
    print("Found them all")

You could also use difference():

def find_missing(a_lst):
    start = a_lst[0]
    end = a_lst[-1]
    return sorted(set(range(start, end   1)).difference(a_lst))