Background
I've got an R
dataframe, d
:
d <- data.frame(ID = c("a","a","b","b", "c","c","c"),
birthdate = as.Date(c("1980-01-01","1980-01-01","2000-12-23","2000-12-23","1949-03-14","1949-03-14","1949-03-14")),
event_date = as.Date(c("2011-01-01","2012-08-21","2011-12-23","2011-12-31","2013-03-14","2013-04-07","2014-07-14")),
stringsAsFactors=FALSE)
It consists of an ID
code and two dates: a birthdate
and an event_date
. Everyone's got a consistent birthdate, but people have multiple events each, all of these occurring on different dates.
The Problem
I'm trying to calculate the average age of people (ID
s) in d
at their first event. In other words, I'd like to get R
to calculate an "age at first event" by subtracting each ID's first event from their birthdate
, and then sum them and divide by n (3, in this case).
The answer (if my arithmetic isn't too far off this late at night) should be ~35.3 years old.
What I've tried
I'm not too familiar with date work in R
, so I've only gotten so far as mutating a new column that calculates the difference between event_date
and birthdate
for that row:
d <- d %>%
mutate(date_difference = (event_date-birthdate)/365)
But I'm still a ways away from my summary calculation. I'm mainly hung up on how to tell R
to find the first date-difference for each ID
. (Not to mention that dividing by 365 gives me correct years but they're still labeled "days" in the resulting df.)
CodePudding user response:
You can use this code:
d <- d %>%
group_by(ID) %>%
arrange(event_date) %>%
slice(1) %>%
mutate(date_difference = as.numeric((event_date-birthdate)/365)) %>%
ungroup() %>%
mutate(average_age = mean(date_difference))
Output:
# A tibble: 3 × 5
ID birthdate event_date date_difference average_age
<chr> <date> <date> <dbl> <dbl>
1 a 1980-01-01 2011-01-01 31.0 35.4
2 b 2000-12-23 2011-12-23 11.0 35.4
3 c 1949-03-14 2013-03-14 64.0 35.4
CodePudding user response:
As an alternative to dividing by 365, you can use the lubridate::time_length
function. It computes the length of a period in different time units (seconds, minutes, days, years).
library("tidyverse")
dat <- data.frame(
ID = c("a", "a", "b", "b", "c", "c", "c"),
birthdate = as.Date(c("1980-01-01", "1980-01-01", "2000-12-23", "2000-12-23", "1949-03-14", "1949-03-14", "1949-03-14")),
event_date = as.Date(c("2011-01-01", "2012-08-21", "2011-12-23", "2011-12-31", "2013-03-14", "2013-04-07", "2014-07-14")),
stringsAsFactors = FALSE
)
dat_with_age <- dat %>%
group_by(ID) %>%
slice_min(
event_date,
n = 1
) %>%
ungroup() %>%
mutate(
age_at_first_event = lubridate::time_length(event_date - birthdate, unit = "year")
)
dat_with_age
#> # A tibble: 3 × 4
#> ID birthdate event_date age_at_first_event
#> <chr> <date> <date> <dbl>
#> 1 a 1980-01-01 2011-01-01 31.0
#> 2 b 2000-12-23 2011-12-23 11.0
#> 3 c 1949-03-14 2013-03-14 64
dat_with_age %>%
summarise(
mean(age_at_first_event)
)
#> # A tibble: 1 × 1
#> `mean(age_at_first_event)`
#> <dbl>
#> 1 35.3
Created on 2022-03-11 by the reprex package (v2.0.1)