The task:
Write a function that receives 3 arrays and returns an array. The first array contains n integers, their values range between 0 and 10^9. "numbers". The second array is a low-range array, which contains the lower end of a range, it contains q integers. "low". The third array is a high-range array, which contains the higher end of a range, it contains q integers. "high".
The function should return an array that contains the number of integers in the first array, that fall in its range, given by the low-range and high-range arrays.
In the returned array, at index i, there should be the number of integers in "numbers" which are bigger or equal to low[i] and smaller or equal to high[i].
Examples:
count_range([12,13,14,15,17],[14],[14]) should return [1]
count_range([12,13,14,15,17],[14,15],[14,18]) should return [1,2]
count_range([12,13,14,15,17],[12],[17]) should return [5]
This is how I solved the question but I feel like there might be a more efficient way to solve seeing as the arrays could be long and the numbers could be really big.
I'd be glad to get some insights or tests that challenge this could to help me think in a better direction.
def binarySearch(data, val):
highIndex = len(data) - 1
lowIndex = 0
while highIndex > lowIndex:
index = math.ceil((highIndex lowIndex) / 2)
sub = data[index]
if sub > val:
if highIndex == index:
return sorted([highIndex, lowIndex])
highIndex = index
else:
if lowIndex == index:
return sorted([highIndex, lowIndex])
lowIndex = index
return sorted([highIndex, lowIndex])
def count_range(numbers, low, high):
numbers.sort()
result = []
range_dict = {}
for i in range(len(numbers)):
if numbers[i] not in range_dict:
range_dict[numbers[i]] = i
for i in range(len(low)):
low_r = low[i]
high_r = high[i]
if low_r not in range_dict:
range_dict[low_r] = binarySearch(numbers, low_r)[0]
low_index = range_dict.get(low_r)
if high_r not in range_dict:
range_dict[high_r] = binarySearch(numbers, high_r)[0]
high_index = range_dict.get(high_r)
while high_index 1 < len(numbers) and numbers[high_index 1] == numbers[high_index]:
high_index = 1
if low_r in numbers or low_r < numbers[0]:
low_index -= 1
result.append(high_index - low_index)
print(result)
return result
CodePudding user response:
Using Python's bisect
:
from bisect import bisect_left, bisect_right
def count_range(numbers, low, high):
numbers.sort()
return [bisect_right(numbers, hi) - bisect_left(numbers, lo)
for hi, lo in zip(high, low)]
In my solution, the sorting actually took about 90% of my time, then handling the queries only took about 10% of my time.
Benchmark with 100,000 numbers and 1000 ranges (Try it online!):
71.6 ms count_range_Sara
24.1 ms count_range_Kelly
6303.6 ms count_range_Nin17
70.5 ms count_range_Sara
23.1 ms count_range_Kelly
6225.0 ms count_range_Nin17
64.5 ms count_range_Sara
23.1 ms count_range_Kelly
6306.7 ms count_range_Nin17
(Note: Sara's solution I measured is the original slightly buggy one, which I included despite the buggyness as the question is about efficiency. And Nin17's solution that I measured is their original one from their now deleted answer, not the NumPy one.)
CodePudding user response:
With numpy:
import numpy as np
def count_range(numbers, low, high):
numbers.sort()
return np.searchsorted(numbers, high, 'right') - np.searchsorted(numbers, low, 'left')
Benchmark:
from timeit import timeit
def test_data():
RANGE = range(10**9)
numbers = random.choices(RANGE, k=10**5)
ranges = (sorted(random.sample(RANGE, 2))
for _ in range(1000))
[*low], [*high] = zip(*ranges)
return [numbers, low, high]
test = test_data()
%timeit kelly(*[t.copy() for t in test])
test2 = [np.array(i) for i in test]
%timeit Nin17(*[t.copy() for t in test2])
Output:
13.4 ms ± 77.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.63 ms ± 14 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)