Home > database >  Deriving median from grouped frequencies in R
Deriving median from grouped frequencies in R

Time:02-19

I have the following example table where I am required to find the median age of a herd of animals. Not only does it have a 0, it is also has a grouped frequency of animals for a given age.

library(tidyverse)
a<-data.frame(Age=c(0,1,2,3,4,5,6,7,8,9),
              Individuals=c(3655,2535,898,235,559,265,258,3659,7895,3655))
a%>%summarise(Age=as.numeric(Age),
          Median=sort(as.numeric(Age)*Persons/sum(Individuals)))

I understand that the standard median() option does not work. I tried to be clever and attempted something like: median(rep(a$Age, a$Individuals)), but the memory consumption was too much. Besides, I think it will fail with a larger dataset.

CodePudding user response:

You can uncount the original data frame and then use the standard median function.

a %>% uncount(Individuals) %>% summarise(Median=median(Age))
  Median
1      7

And to check:

> sum(a$Individuals)/2
[1] 11807
> sum(a$Individuals[1:7])
[1] 8405
> sum(a$Individuals[1:8])
[1] 12064

All good.

CodePudding user response:

You could be abit clever and do:

a %>%
  arrange(Age) %>%
  summarise(median = Age[findInterval(sum(Individuals)/2, cumsum(Individuals))   1])

  median
1      7
  • Related