Home > Net >  A more efficient function for calculating the proportion of observations that fall within a specifie
A more efficient function for calculating the proportion of observations that fall within a specifie

Time:11-12

I've written a function which calculates the proportion of observations that fall within a specified interval. So, if our observations are assessment marks, we can find out the proportion of students that got, say, between 70 and 100 marks. I've included a boolean parameter since in all but the last interval (with the largest observation as the upper bound) we want to say that the value on the upper bound is included in the next interval. For example, if we're looking at marks between 50-70, we don't want to include 70. My function is:

import numpy as np
def compute_interval_proportion(observations, lower_bound, upper_bound, include_upper):
"""
Calculates the proportion of observations that fall within a specified interval.
    If include_upper == True, then the interval is inclusive; otherwise not.
"""
    if include_upper == True:
        indices = np.where((observations >= lower_bound)
                       & (assessment1marks <= upper_bound))
    
    else:
        indices = np.where((observations >= lower_bound)
                       & (assessment1marks < upper_bound))

    count = len(observations[indices])
    proportion = round(count / len(assessment1marks),3)

    return proportion

This function works, I think, but I feel it is a bit pedestrian (e.g. lots of parameters) and perhaps there is a more sleek or quicker way of doing it. Perhaps there is a way of avoiding requiring the user to manually specify whether they want to include the upper bound or not. Any suggestions?

CodePudding user response:

I've tried to simplify your function, the results are below. The main changes are:

  • We automatically detect if upper is the observations' upper bound, in which case we include the bound in the interval.
  • numpy conveniently lets you sum booleans by casting False to 0 and True to 1, which allows us to turn the proportion calculation into a simple mean.
def compute_interval_proportion(observations, lower, upper):
    if upper >= observations.max():
        upper_cond = observations <= upper
    else: 
        upper_cond = observations < upper
    proportion = ((observations >= lower) & upper_cond).mean()
    return proportion.round(3)
  • Related