Home > OS >  How to fill in missing values with another column?
How to fill in missing values with another column?

Time:07-20

I have a dataset that looks like this:

   Study_ID Death_or_Follow_Up_Date Follow_Up_Date
1       100                  012022         052321
2       200                  031021         091121
3       300                  051822             NA
4       400                      NA         042322
5       500                  042521             NA
6       600                      NA         062720
7       700                  061020             NA
8       800                  032021         062520
9       900                      NA         012021
10     1000                      NA         111021

Wherever there is a missing, NA value in Death_or_Follow_Up_Date, I would like to fill it in with data from the Follow_Up_Date column. However, where there is data in Death_or_Follow_Up_Date, I do not want it replaced with the data in Follow_Up_Date.

In other words, I would just like the missing values in Death_or_Follow_Up_Date to be filled, but for the rest to remain exactly the same.

(This is due to the fact that I want the death date over the follow-up date, but follow-up date where there is no death date).

My desired output would look like this:

   Study_ID Death_or_Follow_Up_Date Follow_Up_Date
1       100                  012022         052321
2       200                  031021         091121
3       300                  051822             NA
4       400                  042322         042322
5       500                  042521             NA
6       600                  062720         062720
7       700                  061020             NA
8       800                  032021         062520
9       900                  012021         012021
10     1000                  111021         111021

Where Death_or_Follow_Up_Date has a full set of data for every patient.

How can I go about doing this?

Reproducible data:

data<-data.frame(Study_ID=c("100","200","300","400","500","600","700","800","900","1000"),Death_or_Follow_Up_Date=c("012022","031021","051822","NA","042521","NA","061020","032021","NA","NA"),Follow_Up_Date=c("052321","091121","NA","042322","NA","062720","NA","062520","012021","111021"))

CodePudding user response:

An option is to use an if_else statement.

library(dplyr)
data %>%
    mutate(Death_or_Follow_Up_Date = if_else(
        Death_or_Follow_Up_Date == "NA" & Follow_Up_Date != "NA",
        Follow_Up_Date,
        Death_or_Follow_Up_Date))
#   Study_ID Death_or_Follow_Up_Date Follow_Up_Date
#1       100                  012022         052321
#2       200                  031021         091121
#3       300                  051822             NA
#4       400                  042322         042322
#5       500                  042521             NA
#6       600                  062720         062720
#7       700                  061020             NA
#8       800                  032021         062520
#9       900                  012021         012021
#10     1000                  111021         111021

Note however that your NAs are not true NAs; they are character strings "NA". So is.na() will not work (compare is.na(NA) vs. is.na("NA")).

CodePudding user response:

Below we have assumed that the data contains actual NA's rather than character strings "NA" and use coalesce.

data %>%
  mutate(across(Death_or_Follow_Up_Date, coalesce, Follow_Up_Date))

Note

Assumed input.

data <- 
structure(list(Study_ID = c("100", "200", "300", "400", "500", 
"600", "700", "800", "900", "1000"), Death_or_Follow_Up_Date = c("012022", 
"031021", "051822", NA, "042521", NA, "061020", "032021", NA, 
NA), Follow_Up_Date = c("052321", "091121", NA, "042322", NA, 
"062720", NA, "062520", "012021", "111021")), class = "data.frame", row.names = c(NA, 
-10L))
  • Related