I have two data frames. df1
has some NAs in a column that I'm interested in, and df2
has this same column but with the values I need. I want to fill those values in df1 according to the rownames (the rownames in df2 are found in df1).
For example, df1 would be something like this:
Age Gender
Patient_1 NA Male # Has NA
Patient_3 30 Male
Patient_66 45 Female
Patient_10 NA Female # Has NA
Patient_11 67 Female
Patient_23 NA Male # Has NA
and df2 contains the same information but only for the samples that have NA in the age column in df1:
Age Gender
Patient_1 59 Male
Patient_10 80 Female
Patient_23 21 Male
Now how do I fill those Age values, from df2 to df1, for each sample?
I tried this:
for(i in 1:length(df1$Age)){
if(rownames(df1)[i] == rownames(df2)[i]){
df1$Age[[i]] = df2$Age[[i]]
}
}
That didn't work as it filled only a subset of the values for some reason, some values are still NA in df1.
CodePudding user response:
Since df2
contains data for all NA rows in df1
, you can simply index all NA rows in df1
using the rownames of df2
and then overwrite their Age
values with the df2$Age
column.
df1[rownames(df2), "Age"] <- df2$Age
CodePudding user response:
The dplyr
and tibble
packages have nice functions for this. First convert to a tibble
with tibble::rownames_to_column()
. Then use dplyr::rows_patch()
to replace the NA values with the corresponding values by patient_id
.
library(dplyr)
library(tibble)
df1 |>
rownames_to_column("patient_id") |>
rows_patch(
rownames_to_column(df2, "patient_id")
)
# Matching, by = "patient_id"
# patient_id Age Gender
# 1 Patient_1 59 Male
# 2 Patient_3 30 Male
# 3 Patient_66 45 Female
# 4 Patient_10 80 Female
# 5 Patient_11 67 Female
# 6 Patient_23 21 Male