From my basic math I know the median salary is 40000 of all jobs listed but how would I obtain that using NumPy?
eg Find the salary of the median of all jobs listed
1st column = salary
2nd column = no. of jobs advertised
``` x = np.array([ [10000, 329], [20000, 329], [30000, 323], [40000, 310], [50000, 284], [60000, 232], [70000, 189], [80000, 130], [90000, 87], [100000, 71]] )
CodePudding user response:
You have a frequency table. You are interested in finding the first value from x[:, 0]
corresponding to where the midpoint falls on the cumulative frequency.
You can use:
# cumulative frequency
cf = np.cumsum(x[:, 1])
# total items
n = cf[-1]
midpoint = (n 1) / 2
median = x[:, 0][midpoint < cf][0]
However, you should add a special case for when the median is the mean of the two adjacent values, such as:
x = np.array([[1, 2], [2, 2]])
The true median is 1.5, whereas this method gives 2.