Home > OS >  How do I plot a histogram of ages in a tibble with multiple observations per patient?
How do I plot a histogram of ages in a tibble with multiple observations per patient?

Time:01-18

I have a tibble with one row per observation. The columns have variables such as ID number, DOB and test results

d1

ID DOB result
a 1940-01-01 15
a 1940-01-01 17
b 1933-05-20 11
b 1933-05-20 20

I want to make a histogram of the age of the patients but I can only get the histogram to show every occurence of the DOB, so I have n = patients * observations per patients data instead of n= patients.

I tried:

ggplot(d1, aes(eeptools::age_calc(dob = as.Date(DOB), enddate = Sys.Date(), units = 'years')))   geom_histogram(binwidth = 1)

How do I subset so I only get one DOB for each ID? Thanks!

CodePudding user response:

If you are not interested in the results column, then you could simply drop it by using subset and then use the function distinctto remove all duplicates. I am a bit unsure of your years (is it years or year of birth?), but using years as age since today, I got this:


# Import packages
library(ggplot2)
library(dplyr)

# Make dataframe
df <- data.frame(ID = c("a", "a", "b", "b"),
       DOB = c("1940-01-01", "1940-01-01", "1933-05-20", "1933-05-20"),
       result = c(15, 17, 11, 20))


#Mutate date to correct class - it most likely already is in your example
df %>%  mutate(date = as.Date(DOB),
               years = lubridate::year(date),
               age = 2023 - years) %>% 

# Subset data to remove results
  subset(select = - result) %>% 

# Remove duplicates using distinct
  distinct() %>% 
  
# Plot
  ggplot(aes(x=age,))  
  geom_histogram(bins = 2)

  • Related