How to find median in Numpy 2d array with matching column-CodePudding

From my basic math I know the median salary is 40000 of all jobs listed but how would I obtain that using NumPy?

eg Find the salary of the median of all jobs listed

1st column = salary

2nd column = no. of jobs advertised

 ``` x = np.array([
           [10000, 329],
           [20000, 329],
           [30000, 323],
           [40000, 310],
           [50000, 284],
           [60000, 232],
           [70000, 189],
           [80000, 130],
           [90000, 87],
           [100000, 71]]
           )

CodePudding user response：

You have a frequency table. You are interested in finding the first value from x[:, 0] corresponding to where the midpoint falls on the cumulative frequency.

You can use:

# cumulative frequency
cf = np.cumsum(x[:, 1])

# total items
n = cf[-1] 

midpoint = (n   1) / 2

median = x[:, 0][midpoint < cf][0]

However, you should add a special case for when the median is the mean of the two adjacent values, such as:

x = np.array([[1, 2], [2, 2]])

The true median is 1.5, whereas this method gives 2.