Fill in values of a data frame from another data frame-CodePudding

I have two data frames. df1 has some NAs in a column that I'm interested in, and df2 has this same column but with the values I need. I want to fill those values in df1 according to the rownames (the rownames in df2 are found in df1).

For example, df1 would be something like this:

              Age    Gender     
Patient_1      NA     Male  # Has NA
Patient_3      30     Male
Patient_66     45     Female
Patient_10     NA     Female  # Has NA
Patient_11     67     Female
Patient_23     NA     Male  # Has NA

and df2 contains the same information but only for the samples that have NA in the age column in df1:

              Age    Gender
Patient_1     59      Male
Patient_10    80      Female
Patient_23    21      Male

Now how do I fill those Age values, from df2 to df1, for each sample?

I tried this:

for(i in 1:length(df1$Age)){
  
  if(rownames(df1)[i] == rownames(df2)[i]){
    
    df1$Age[[i]] = df2$Age[[i]]
    
  }
}

That didn't work as it filled only a subset of the values for some reason, some values are still NA in df1.

CodePudding user response：

Since df2 contains data for all NA rows in df1, you can simply index all NA rows in df1 using the rownames of df2 and then overwrite their Age values with the df2$Age column.

df1[rownames(df2), "Age"] <- df2$Age

CodePudding user response：

The dplyr and tibble packages have nice functions for this. First convert to a tibble with tibble::rownames_to_column(). Then use dplyr::rows_patch() to replace the NA values with the corresponding values by patient_id.

library(dplyr)
library(tibble)

df1 |>
    rownames_to_column("patient_id") |>
    rows_patch(
        rownames_to_column(df2, "patient_id")
    )

# Matching, by = "patient_id"
#   patient_id Age Gender
# 1  Patient_1  59   Male
# 2  Patient_3  30   Male
# 3 Patient_66  45 Female
# 4 Patient_10  80 Female
# 5 Patient_11  67 Female
# 6 Patient_23  21   Male