Home > Mobile >  How to calculate the mean age from the data stated as 2 digit years in R
How to calculate the mean age from the data stated as 2 digit years in R

Time:08-29

I have a data frame like this:

ID <- c("A", "B", "C", "D")
birthday <- c(12, 23, 2, 20)
birthmonth <- c(8, 10, 3, 9)
birthyear <- c(79, 62, 66, 83)
mydf <- data.frame(ID, birthday, birthmonth, birthyear)
mydf
  ID birthday birthmonth birthyear
1  A       12          8        79
2  B       23         10        62
3  C        2          3        66
4  D       20          9        83

So, as you can see birth years are stated as 2 digits, and month, day, and year information are on different columns. In such a data frame, how can I calculate mean age for my sample?

Thank you so much!

CodePudding user response:

We could use lubridate's make_date() to turn the individual columns into a date column and then calculate the age. I have shown here how you could take care of the missing 19/20 in birthyear, but you might need to tweak it for your data.

library(dplyr)
library(lubridate)

mydf |> 
    mutate(date = make_date(if_else(birthyear > 21, birthyear 1900, birthyear), birthmonth, birthday),
           age  = as.period(interval(date, today()))$year
    )

Output:

  ID birthday birthmonth birthyear       date age
1  A       12          8        79 1979-08-12  43
2  B       23         10        62 1962-10-23  59
3  C        2          3        66 1966-03-02  56
4  D       20          9        83 1983-09-20  38

And to get the mean age with summarise:

mydf |> 
    mutate(date = make_date(if_else(birthyear > 21, birthyear 1900, birthyear), birthmonth, birthday),
           age  = as.period(interval(date, today()))$year
    ) |>
    summarise(mean_age = mean(age))

Output:

  mean_age
1       49

Update: It can be non-trivial to get the right age calculation (fast), check e.g. Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date

  • Related