I have patient follow-up visits - each patient has several visits, and are missing data from some visits. I'm pulling the value from the first visit that has non-missing data (i.e. their earliest follow-up with data).
How could I also pull the date that goes along with that first non-missing data value? (All patients have the date filled in, regardless of if the value was missing)
sample data:
library(tibble)
library(dplyr)
data <- tribble(
~PatientID, ~Visit_Date, ~Status,
155, "15-Mar-2016", NA,
155, "20-Apr-2017", 2040,
155, "09-May-2018", 2050,
200, "04-Apr-2004", NA,
200, "10-Jun-2005", NA,
200, "07-Jul-2006", 2080
)
I'm thinking the code will go something like
data %>%
group_by(PatientID) %>%
mutate(first_status = dplyr::first(na.omit(Status))) %>%
mutate(first_date = ___________)
Desired output:
tribble(
~PatientID, ~Visit_Date, ~Status, ~first_status, ~first_date,
155, "15-Mar-2016", NA, 2040, "20-Apr-2017",
155, "20-Apr-2017", 2040, 2040, "20-Apr-2017",
155, "09-May-2018", 2050, 2040, "20-Apr-2017",
200, "04-Apr-2004", NA, 2080, "07-Jul-2006",
200, "10-Jun-2005", NA, 2080, "07-Jul-2006",
200, "07-Jul-2006", 2080, 2080, "07-Jul-2006",
)
CodePudding user response:
We may use complete.cases
or is.na
to get the logical vector for subsetting (assuming Visit_Date is ordered by 'PatientID')
library(dplyr)
data %>%
group_by(PatientID) %>%
mutate(first_status = Status[!is.na(Status)][1],
first_date = Visit_Date[!is.na(Status)][1]) %>%
ungroup
-output
# A tibble: 6 × 5
PatientID Visit_Date Status first_status first_date
<dbl> <chr> <dbl> <dbl> <chr>
1 155 15-Mar-2016 NA 2040 20-Apr-2017
2 155 20-Apr-2017 2040 2040 20-Apr-2017
3 155 09-May-2018 2050 2040 20-Apr-2017
4 200 04-Apr-2004 NA 2080 07-Jul-2006
5 200 10-Jun-2005 NA 2080 07-Jul-2006
6 200 07-Jul-2006 2080 2080 07-Jul-2006