How to fill in missing values with another column?-CodePudding

I have a dataset that looks like this:

   Study_ID Death_or_Follow_Up_Date Follow_Up_Date
1       100                  012022         052321
2       200                  031021         091121
3       300                  051822             NA
4       400                      NA         042322
5       500                  042521             NA
6       600                      NA         062720
7       700                  061020             NA
8       800                  032021         062520
9       900                      NA         012021
10     1000                      NA         111021

Wherever there is a missing, NA value in Death_or_Follow_Up_Date, I would like to fill it in with data from the Follow_Up_Date column. However, where there is data in Death_or_Follow_Up_Date, I do not want it replaced with the data in Follow_Up_Date.

In other words, I would just like the missing values in Death_or_Follow_Up_Date to be filled, but for the rest to remain exactly the same.

(This is due to the fact that I want the death date over the follow-up date, but follow-up date where there is no death date).

My desired output would look like this:

   Study_ID Death_or_Follow_Up_Date Follow_Up_Date
1       100                  012022         052321
2       200                  031021         091121
3       300                  051822             NA
4       400                  042322         042322
5       500                  042521             NA
6       600                  062720         062720
7       700                  061020             NA
8       800                  032021         062520
9       900                  012021         012021
10     1000                  111021         111021

Where Death_or_Follow_Up_Date has a full set of data for every patient.

How can I go about doing this?

Reproducible data:

data<-data.frame(Study_ID=c("100","200","300","400","500","600","700","800","900","1000"),Death_or_Follow_Up_Date=c("012022","031021","051822","NA","042521","NA","061020","032021","NA","NA"),Follow_Up_Date=c("052321","091121","NA","042322","NA","062720","NA","062520","012021","111021"))

CodePudding user response：

An option is to use an if_else statement.

library(dplyr)
data %>%
    mutate(Death_or_Follow_Up_Date = if_else(
        Death_or_Follow_Up_Date == "NA" & Follow_Up_Date != "NA",
        Follow_Up_Date,
        Death_or_Follow_Up_Date))
#   Study_ID Death_or_Follow_Up_Date Follow_Up_Date
#1       100                  012022         052321
#2       200                  031021         091121
#3       300                  051822             NA
#4       400                  042322         042322
#5       500                  042521             NA
#6       600                  062720         062720
#7       700                  061020             NA
#8       800                  032021         062520
#9       900                  012021         012021
#10     1000                  111021         111021

Note however that your NAs are not true NAs; they are character strings "NA". So is.na() will not work (compare is.na(NA) vs. is.na("NA")).

CodePudding user response：

Below we have assumed that the data contains actual NA's rather than character strings "NA" and use coalesce.

data %>%
  mutate(across(Death_or_Follow_Up_Date, coalesce, Follow_Up_Date))

Note

Assumed input.

data <- 
structure(list(Study_ID = c("100", "200", "300", "400", "500", 
"600", "700", "800", "900", "1000"), Death_or_Follow_Up_Date = c("012022", 
"031021", "051822", NA, "042521", NA, "061020", "032021", NA, 
NA), Follow_Up_Date = c("052321", "091121", NA, "042322", NA, 
"062720", NA, "062520", "012021", "111021")), class = "data.frame", row.names = c(NA, 
-10L))