I have a dataset that looks like this:
Study_ID Death_or_Follow_Up_Date Follow_Up_Date
1 100 012022 052321
2 200 031021 091121
3 300 051822 NA
4 400 NA 042322
5 500 042521 NA
6 600 NA 062720
7 700 061020 NA
8 800 032021 062520
9 900 NA 012021
10 1000 NA 111021
Wherever there is a missing, NA value in Death_or_Follow_Up_Date, I would like to fill it in with data from the Follow_Up_Date column. However, where there is data in Death_or_Follow_Up_Date, I do not want it replaced with the data in Follow_Up_Date.
In other words, I would just like the missing values in Death_or_Follow_Up_Date to be filled, but for the rest to remain exactly the same.
(This is due to the fact that I want the death date over the follow-up date, but follow-up date where there is no death date).
My desired output would look like this:
Study_ID Death_or_Follow_Up_Date Follow_Up_Date
1 100 012022 052321
2 200 031021 091121
3 300 051822 NA
4 400 042322 042322
5 500 042521 NA
6 600 062720 062720
7 700 061020 NA
8 800 032021 062520
9 900 012021 012021
10 1000 111021 111021
Where Death_or_Follow_Up_Date has a full set of data for every patient.
How can I go about doing this?
Reproducible data:
data<-data.frame(Study_ID=c("100","200","300","400","500","600","700","800","900","1000"),Death_or_Follow_Up_Date=c("012022","031021","051822","NA","042521","NA","061020","032021","NA","NA"),Follow_Up_Date=c("052321","091121","NA","042322","NA","062720","NA","062520","012021","111021"))
CodePudding user response:
An option is to use an if_else
statement.
library(dplyr)
data %>%
mutate(Death_or_Follow_Up_Date = if_else(
Death_or_Follow_Up_Date == "NA" & Follow_Up_Date != "NA",
Follow_Up_Date,
Death_or_Follow_Up_Date))
# Study_ID Death_or_Follow_Up_Date Follow_Up_Date
#1 100 012022 052321
#2 200 031021 091121
#3 300 051822 NA
#4 400 042322 042322
#5 500 042521 NA
#6 600 062720 062720
#7 700 061020 NA
#8 800 032021 062520
#9 900 012021 012021
#10 1000 111021 111021
Note however that your NA
s are not true NAs; they are character
strings "NA"
. So is.na()
will not work (compare is.na(NA)
vs. is.na("NA")
).
CodePudding user response:
Below we have assumed that the data contains actual NA's rather than character strings "NA"
and use coalesce
.
data %>%
mutate(across(Death_or_Follow_Up_Date, coalesce, Follow_Up_Date))
Note
Assumed input.
data <-
structure(list(Study_ID = c("100", "200", "300", "400", "500",
"600", "700", "800", "900", "1000"), Death_or_Follow_Up_Date = c("012022",
"031021", "051822", NA, "042521", NA, "061020", "032021", NA,
NA), Follow_Up_Date = c("052321", "091121", NA, "042322", NA,
"062720", NA, "062520", "012021", "111021")), class = "data.frame", row.names = c(NA,
-10L))