Home > Enterprise >  median(x) is does not correctly return the middle value of x
median(x) is does not correctly return the middle value of x

Time:07-01

I have a really frustrating problem with R. What I want is fairly easy: I have a vector of numeric values (without NA) and want to calculate the median value. To perform this easy task I wrote the following line:

#returning 4.0585
medianOfVector <- median(dataFrame$colname)

However, I had to realize that the value this line returns does not match with the number I get when perorming the following line:

#returning 1048
lengthOfVector <- length(dataFrame$colname)
#returning 4.1355
medianOfVector2 <- (dataFrame$colname[524] dataFrame$colname[525])/2

As I understand it right the median() function should return the value that is exactly in the middle of the vector (or the mean of the two values in the middle if the length of the vector is even), but this seems not to be the case. Unfortunately I can't trace the steps the median() function is doing, so I can't solve the problem. Can anyone help here, or tell me where I may have made a mistake?

CodePudding user response:

Median is the middle value of sorted values. Have you sorted that column before finding this middle value? Here is a toy demonstration of what can go wrong if values are unsorted.

## a vector of even length
set.seed(0); x <- sample.int(10)
#[1]  9  4  7  1  2  5  3 10  6  8

## true value
median(x)
#[1] 5.5

## values are unsorted
is.unsorted(x)
#[1] TRUE
## "middle" value
0.5 * (x[length(x) / 2]   x[length(x) / 2   1])
#[1] 3.5

## correct calculation with sorted values
sx <- sort(x)
## "middle" value
(sx[length(x) / 2]   sx[length(x) / 2   1]) / 2
#[1] 5.5
  • Related