For Example
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
x = first_interval[0] <= data <= second_interval[0]
y = first_interval[1] <= data <= second_intercal[1] # and so on
I want to know how many numbers from data is between 40-49, 50-59, 60-69 and so on
frequency = [4, 6] # 4 is x and 6 is y
CodePudding user response:
Iterate on the bounds using zip
, then with a list comprehension you can filter the correct values
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65,
65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
result = {}
for start, end in zip(first_interval, second_interval):
result[(start, end)] = len([v for v in data if start <= v <= end])
print(result)
# {(40, 49): 4, (50, 59): 6, (60, 69): 10, (70, 79): 4, (80, 89): 4, (90, 99): 2}
print(result[(40, 49)])
# 4
The version with a list and len
is easier to understand
result[(start, end)] = len([v for v in data if start <= v <= end])
But the following version would be more performant for bigger size, as it's a generator, it won't have to build the whole list to just forget it after
result[(start, end)] = sum((1 for v in data if start <= v <= end))
Another version, that doesn't use the predefined bounds, and so is much performant as it's complexity is O(n)
and not O(n*m)
as the first one : you iterate once on values, not on values for each bounds
result = defaultdict(int) # from collections import defaultdict
for value in data:
start = 10 * (value // 10)
result[(start, start 9)] = 1
CodePudding user response:
This may help you :
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
def find_occurence(start,end,data):
counter = 0
for i in data :
if start<=i<=end :
counter = 1
return counter
print(find_occurence(first_interval[0],second_interval[0],Data)) #this gives you the anser for x and the same thing for y
Note : start :means from where you want to start. end : mean where you want to stop.
CodePudding user response:
We can use numpy.histogram with bins defined by:
- first_interval bins, but open on the right
- max(second_interval) to determine the close of rightmost bin
Code
# Generate counts and bins (right most edge given by max(second_interval))
frequency, bins = np.histogram(data, bins = first_interval [max(second_interval)])
# Show Results
for i in range(len(frequency)):
if i < len(frequency) - 1:
print(f'{bins[i]}-{bins[i 1]-1} : {frequency[i]}') # frequency doesn't include right edge
else:
print(f'{bins[i]}-{bins[i 1]} : {frequency[i]}') # frequency includes right edge in last bin
Output
40-49 : 4
50-59 : 6
60-69 : 10
70-79 : 4
80-89 : 4
90-99 : 2