How do I plot a histogram of ages in a tibble with multiple observations per patient?-CodePudding

I have a tibble with one row per observation. The columns have variables such as ID number, DOB and test results

ID	DOB	result
a	1940-01-01	15
a	1940-01-01	17
b	1933-05-20	11
b	1933-05-20	20

I want to make a histogram of the age of the patients but I can only get the histogram to show every occurence of the DOB, so I have n = patients * observations per patients data instead of n= patients.

I tried:

ggplot(d1, aes(eeptools::age_calc(dob = as.Date(DOB), enddate = Sys.Date(), units = 'years')))   geom_histogram(binwidth = 1)

How do I subset so I only get one DOB for each ID? Thanks!

CodePudding user response：

If you are not interested in the results column, then you could simply drop it by using subset and then use the function distinctto remove all duplicates. I am a bit unsure of your years (is it years or year of birth?), but using years as age since today, I got this:


# Import packages
library(ggplot2)
library(dplyr)

# Make dataframe
df <- data.frame(ID = c("a", "a", "b", "b"),
       DOB = c("1940-01-01", "1940-01-01", "1933-05-20", "1933-05-20"),
       result = c(15, 17, 11, 20))


#Mutate date to correct class - it most likely already is in your example
df %>%  mutate(date = as.Date(DOB),
               years = lubridate::year(date),
               age = 2023 - years) %>% 

# Subset data to remove results
  subset(select = - result) %>% 

# Remove duplicates using distinct
  distinct() %>% 
  
# Plot
  ggplot(aes(x=age,))  
  geom_histogram(bins = 2)