Home > other >  How do I figure out month and year of birth from data that includes multiple observations and age in
How do I figure out month and year of birth from data that includes multiple observations and age in

Time:10-30

I have data that includes multiple observations of the 1000's of people over many years. These data include the age at the time of observation. The data looks like this:

Person A | 2016-01-0 | 35
Person A | 2016-01-3 | 35
Person A | 2016-01-8 | 36

In this case, you'd be able to tell that Person A was born between 1980-01-04 and 1980-01-08 I'd like to be able to capture the date of birth (the month and year of birth will also do - even just the year of birth is fine) for all people in the data. Not sure, however, how to go about doing this without manually looking through and seeing where the age changes and taking note of the corresponding date.

Any help is greatly appreciated.

CodePudding user response:

Each row gives us one possible range for that person. If you take the latest "min_DOB" and the earliest "max_DOB" of those rows for each person, you can get the kind of narrowed-down range you described.

library(dplyr); library(lubridate)
df %>%
  mutate(min_DOB = V2 %>% ymd - years(V3) - 364, # dragons b/c leap years
         max_DOB = V2 %>% ymd - years(V3)) %>%
  group_by(V1) %>%
  summarize(min_DOB = max(min_DOB),
            max_DOB = min(max_DOB))

# A tibble: 1 x 3
  V1       min_DOB    max_DOB   
  <chr>    <date>     <date>    
1 Person A 1980-01-05 1980-01-08
data
df <- data.frame(
  stringsAsFactors = FALSE,
                V1 = c("Person A", "Person A", "Person A"),
                V2 = c("2016-01-01", "2016-01-03", "2016-01-08"),
                V3 = c(35L, 35L, 36L)
)

CodePudding user response:

You may extract the year by removing everything after the first - and subtract the age from it.

df <- transform(df, yob = as.numeric(sub('-.*', '', V2)) - V3)

#        V1        V2 V3  yob
#1 Person A 2016-01-0 35 1981
#2 Person A 2016-01-3 35 1981
#3 Person A 2016-01-8 36 1980

data

df <- structure(list(V1 = c("Person A", "Person A", "Person A"), V2 = c("2016-01-0", 
"2016-01-3", "2016-01-8"), V3 = c(35L, 35L, 36L)), 
row.names = c(NA, -3L), class = "data.frame")
  • Related