Home > Software engineering >  given an random 100 number with duplicate. I want to count how many number is inside an interval of
given an random 100 number with duplicate. I want to count how many number is inside an interval of

Time:09-12

For Example

first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]


x = first_interval[0] <= data <= second_interval[0]
y = first_interval[1] <= data <= second_intercal[1] # and so on

I want to know how many numbers from data is between 40-49, 50-59, 60-69 and so on

frequency = [4, 6] # 4 is x and 6 is y

CodePudding user response:

Iterate on the bounds using zip, then with a list comprehension you can filter the correct values

first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65,
        65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]

result = {}
for start, end in zip(first_interval, second_interval):
    result[(start, end)] = len([v for v in data if start <= v <= end])

print(result)
# {(40, 49): 4, (50, 59): 6, (60, 69): 10, (70, 79): 4, (80, 89): 4, (90, 99): 2}

print(result[(40, 49)])
# 4

The version with a list and len is easier to understand

result[(start, end)] = len([v for v in data if start <= v <= end])

But the following version would be more performant for bigger size, as it's a generator, it won't have to build the whole list to just forget it after

result[(start, end)] = sum((1 for v in data if start <= v <= end))

Another version, that doesn't use the predefined bounds, and so is much performant as it's complexity is O(n) and not O(n*m) as the first one : you iterate once on values, not on values for each bounds

result = defaultdict(int)  # from collections import defaultdict

for value in data:
    start = 10 * (value // 10)
    result[(start, start   9)]  = 1

CodePudding user response:

This may help you :

first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]


def find_occurence(start,end,data):
    counter = 0
    for i in data :
        if  start<=i<=end :
            counter  = 1
    return counter

print(find_occurence(first_interval[0],second_interval[0],Data)) #this gives you the anser for x and the same thing for y

Note : start :means from where you want to start. end : mean where you want to stop.

CodePudding user response:

We can use numpy.histogram with bins defined by:

  • first_interval bins, but open on the right
  • max(second_interval) to determine the close of rightmost bin

Code

# Generate counts and bins (right most edge given by max(second_interval))
frequency, bins = np.histogram(data, bins = first_interval   [max(second_interval)])  

# Show Results
for i in range(len(frequency)):
    if i < len(frequency) - 1:
        print(f'{bins[i]}-{bins[i 1]-1} : {frequency[i]}')  # frequency doesn't include right edge
    else:
        print(f'{bins[i]}-{bins[i 1]} : {frequency[i]}')    # frequency includes right edge in last bin

Output

40-49 : 4
50-59 : 6
60-69 : 10
70-79 : 4
80-89 : 4
90-99 : 2
  • Related