Home > Back-end >  Is there a function in R that will allow me to compare two dataframes and correct one based on the o
Is there a function in R that will allow me to compare two dataframes and correct one based on the o

Time:01-23

So I am trying to create a reference dataframe containing two columns, 1st column = names and second column = age. The second dataframe also has names and age but names are not in alphabetical order and some names and do not appear in the first data frame. The first dataframe must be used to correct the ages of the names that do appear and to catagorise the age of none-appearing names as "unclassified".

e.g:

df1 <- data.frame (Names = c("Cal", "Ben"),
                   Age = c(12, 35))

df2 <- data.frame (Names = c("Cal", "Ben", "Frank"),
                   Age = c(10, 25, 60))

With this line of code:

my_range <- 1:nrow(df2)

 for (i in my_range){
      if(df2$Name[i] %in% df1$Name[i]){
        df2$Age[i] <- df1$Age[i]
        } else {
          df2$Age[i] <- "Not Classified"
    }
    }`

I get the following:

Name  Age
Cal   12
Ben   35
Frank unclassified

This is the kind of output I want. However, this does not work when df2 names are not in alphabetical order. I need df2 to correct its ages based on df1 irrespective of how the data is sitting in df2.

CodePudding user response:

library(tidyverse)

df2 %>% 
  mutate(across(everything(), as.character), 
         Age = if_else(Names %in% df1$Names, Age, "Unclassified"))

  Names          Age
1   Cal           10
2   Ben           25
3 Frank Unclassified

CodePudding user response:

df_new <- merge(df1, df2, by = "Names", all = TRUE)
df_new$corrected_age <- ifelse(is.na(df_new$Age.x), "Unclassified", df_new$Age.x)

Returns:

  Names Age.x Age.y corrected_age
1   Ben    35    25            35
2   Cal    12    10            12
3 Frank    NA    60   Unclassified

Just be aware that the imputed column is of type character now!

  •  Tags:  
  • r
  • Related