I have a tibble with one row per observation. The columns have variables such as ID number, DOB and test results
d1
ID | DOB | result |
---|---|---|
a | 1940-01-01 | 15 |
a | 1940-01-01 | 17 |
b | 1933-05-20 | 11 |
b | 1933-05-20 | 20 |
I want to make a histogram of the age of the patients but I can only get the histogram to show every occurence of the DOB, so I have n = patients * observations per patients data instead of n= patients.
I tried:
ggplot(d1, aes(eeptools::age_calc(dob = as.Date(DOB), enddate = Sys.Date(), units = 'years'))) geom_histogram(binwidth = 1)
How do I subset so I only get one DOB for each ID? Thanks!
CodePudding user response:
If you are not interested in the results column, then you could simply drop it by using subset
and then use the function distinct
to remove all duplicates. I am a bit unsure of your years (is it years or year of birth?), but using years as age since today, I got this:
# Import packages
library(ggplot2)
library(dplyr)
# Make dataframe
df <- data.frame(ID = c("a", "a", "b", "b"),
DOB = c("1940-01-01", "1940-01-01", "1933-05-20", "1933-05-20"),
result = c(15, 17, 11, 20))
#Mutate date to correct class - it most likely already is in your example
df %>% mutate(date = as.Date(DOB),
years = lubridate::year(date),
age = 2023 - years) %>%
# Subset data to remove results
subset(select = - result) %>%
# Remove duplicates using distinct
distinct() %>%
# Plot
ggplot(aes(x=age,))
geom_histogram(bins = 2)