I was trying an online test. the test asked to write a function that given a list of up to 100000 integers whose range is 1 to 100000, would find the first missing integer.
for example, if the list is [1,4,5,2] the output should be 3.
I iterated over the list as follow
def find_missing(num)
for i in range(1, 100001):
if i not in num:
return i
the feedback I receives is the code is not efficient in handling big lists. I am quite new and I couldnot find an answer, how can I iterate more efficiently?
CodePudding user response:
The first improvement would be to make yours linear by using a set
for the repeated membership test:
def find_missing(nums)
s = set(nums)
for i in range(1, 100001):
if i not in s:
return i
Given how C-optimized python sorting is, you could also do sth like:
def find_missing(nums)
s = sorted(set(nums))
return next(i for i, n in enumerate(s, 1) if i != n)
But both of these are fairly space inefficient as they create a new collection. You can avoid that with an in-place sort:
from itertools import groupby
def find_missing(nums):
nums.sort() # in-place
return next(i for i, (k, _) in enumerate(groupby(nums), 1) if i != k)
CodePudding user response:
You can use Counter
to count every occurrence of your list. The minimum number with occurrence 0 will be your output. For example:
from collections import Counter
def find_missing():
count = Counter(your_list)
keys = count.keys() #list of every element in increasing order
main_list = list(range(1:100000)) #the list of values from 1 to 100k
missing_numbers = list(set(main_list) - set(keys))
your_output = min(missing_numbers)
return your_output
CodePudding user response:
For any range of numbers, the sum is given by Gauss's formula:
# sum of all numbers up to and including nums[-1] minus
# sum of all numbers up to but not including nums[-1]
expected = nums[-1] * (nums[-1] 1) // 2 - nums[0] * (nums[0] - 1) // 2
If a number is missing, the actual sum will be
actual = sum(nums)
The difference is the missing number:
result = expected - actual
This compulation is O(n)
, which is as efficient as you can get. expected
is an O(1)
computation, while actual
has to actually add up the elements.
A somewhat slower but similar complexity approach would be to step along the sequence in lockstep with either a range or itertools.count
:
for a, e in zip(nums, range(nums[0], len(nums) nums[0])):
if a != e:
return e # or break if not in a function
Notice the difference between a single comparison a != e
, vs a linear containment check like e in nums
, which has to iterate on average through half of nums
to get the answer.
CodePudding user response:
This works without numpy:
def find_missing(a_list):
missing_list = []
a_list.sort()
minum = min(a_list)
maxnum = max(a_list)
for i in range(minum, maxnum 1):
if i not in a_list:
missing_list.append(i)
return missing_list
My_list = [4,2,1,5,6,7,8,11,10]
missingIntList = find_missing(My_list)
if missingIntList:
for num in missingIntList:
print(f"{num} is missing")
else:
print("Found them all")
You could also use difference():
def find_missing(a_lst):
start = a_lst[0]
end = a_lst[-1]
return sorted(set(range(start, end 1)).difference(a_lst))