I have a (very messy) dataset that is the product of the merging of a few datasets. It currently looks like this:
Study_ID Status Death_status death
1 100 1 NA NA
2 200 1 NA NA
3 300 0 NA NA
4 400 NA 0 NA
5 500 NA 1 NA
6 600 NA 0 NA
7 700 NA NA 0
8 800 NA NA 1
9 900 NA NA 1
10 1000 NA NA 0
I would like to create a new column that combines all 3 of the columns for each patient.
My desired output would look something like this:
Study_ID New_Death_Status Status Death_status death
1 100 1 1 NA NA
2 200 1 1 NA NA
3 300 0 0 NA NA
4 400 0 NA 0 NA
5 500 1 NA 1 NA
6 600 0 NA 0 NA
7 700 0 NA NA 0
8 800 1 NA NA 1
9 900 1 NA NA 1
10 1000 0 NA NA 0
Where New_Death_Status has a full set of data for every patient.
How can I go about doing this?
Reproducible data:
data<-data.frame(Study_ID=c("100","200","300","400","500","600","700","800","900","1000"),Status=c("1","1","0","NA","NA","NA","NA","NA","NA","NA"),Death_status=c("NA","NA","NA","0","1","0","NA","NA","NA","NA"),death=c("NA","NA","NA","NA","NA","NA","0","1","1","0"))
> data
CodePudding user response:
Assuming we know that know patient will have more than one column (meaning we can safely ignore everything after the first column with non-NA
data), then we can coalesce
it.
However ... your data has literal "NA"
strings instead of the reserveds symbol NA
. I think that may be a mistake in your processing somewhere, so I'll "fix" them to be NA
(and still strings):
library(dplyr)
data %>%
mutate(
# this step just replaces the literal "NA" with the symbol NA
across(c(Status, Death_status, death), ~ if_else(. == "NA", .[NA], .)),
New_Death_Status = coalesce(Status, Death_status, death)
)
# Study_ID Status Death_status death New_Death_Status
# 1 100 1 <NA> <NA> 1
# 2 200 1 <NA> <NA> 1
# 3 300 0 <NA> <NA> 0
# 4 400 <NA> 0 <NA> 0
# 5 500 <NA> 1 <NA> 1
# 6 600 <NA> 0 <NA> 0
# 7 700 <NA> <NA> 0 0
# 8 800 <NA> <NA> 1 1
# 9 900 <NA> <NA> 1 1
# 10 1000 <NA> <NA> 0 0
coalesce
returns the first non-NA
found among its vector arguments, so it will silently discard any subsequent non-NA
value present. Also, it will complain if all of the classes are not the same; they are all strings here, but if your processing is changing classes