So I am trying to create a reference dataframe containing two columns, 1st column = names and second column = age. The second dataframe also has names and age but names are not in alphabetical order and some names and do not appear in the first data frame. The first dataframe must be used to correct the ages of the names that do appear and to catagorise the age of none-appearing names as "unclassified".
e.g:
df1 <- data.frame (Names = c("Cal", "Ben"),
Age = c(12, 35))
df2 <- data.frame (Names = c("Cal", "Ben", "Frank"),
Age = c(10, 25, 60))
With this line of code:
my_range <- 1:nrow(df2)
for (i in my_range){
if(df2$Name[i] %in% df1$Name[i]){
df2$Age[i] <- df1$Age[i]
} else {
df2$Age[i] <- "Not Classified"
}
}`
I get the following:
Name Age
Cal 12
Ben 35
Frank unclassified
This is the kind of output I want. However, this does not work when df2 names are not in alphabetical order. I need df2 to correct its ages based on df1 irrespective of how the data is sitting in df2.
CodePudding user response:
library(tidyverse)
df2 %>%
mutate(across(everything(), as.character),
Age = if_else(Names %in% df1$Names, Age, "Unclassified"))
Names Age
1 Cal 10
2 Ben 25
3 Frank Unclassified
CodePudding user response:
df_new <- merge(df1, df2, by = "Names", all = TRUE)
df_new$corrected_age <- ifelse(is.na(df_new$Age.x), "Unclassified", df_new$Age.x)
Returns:
Names Age.x Age.y corrected_age
1 Ben 35 25 35
2 Cal 12 10 12
3 Frank NA 60 Unclassified
Just be aware that the imputed column is of type character now!