Home > Back-end >  How to find median in Numpy 2d array with matching column
How to find median in Numpy 2d array with matching column

Time:02-24

From my basic math I know the median salary is 40000 of all jobs listed but how would I obtain that using NumPy?

eg Find the salary of the median of all jobs listed

  • 1st column = salary

  • 2nd column = no. of jobs advertised

     ``` x = np.array([
               [10000, 329],
               [20000, 329],
               [30000, 323],
               [40000, 310],
               [50000, 284],
               [60000, 232],
               [70000, 189],
               [80000, 130],
               [90000, 87],
               [100000, 71]]
               )
    

CodePudding user response:

You have a frequency table. You are interested in finding the first value from x[:, 0] corresponding to where the midpoint falls on the cumulative frequency.

You can use:

# cumulative frequency
cf = np.cumsum(x[:, 1])

# total items
n = cf[-1] 

midpoint = (n   1) / 2

median = x[:, 0][midpoint < cf][0]

However, you should add a special case for when the median is the mean of the two adjacent values, such as:

x = np.array([[1, 2], [2, 2]])

The true median is 1.5, whereas this method gives 2.

  • Related