Home > database >  How to match a value in a column of one dataframe with the header of another dataframe in R
How to match a value in a column of one dataframe with the header of another dataframe in R

Time:10-01

I have a dataframe to show the completion of course by members.

employees <- c("Apple", "Apple", "Apple", "Betty", "Betty", "Carol", "Carol", "Carol")
course <- c("Course_A", "Course_F", "Course_C", "Course_B", "Course_A", "Course_D", "Course_E", "Course_A")
completion <- c(1, 0, 1, 1, 1, 0, 1, 1)
df_courses <- data.frame(employees, course, completion)

I have another dataframe which summarises the completion course of members

employees <- c("Apple", "Betty", "Carol")
Course_A <- c("Not Completed", "Not Completed", "Not Applicable")
Course_B <- c("Not Completed", "Not Completed", "Not Completed")
Course_C <- c("Not Completed", "Not Completed", "Not Completed")
Course_D <- c("Not Completed", "Not Completed", "Not Completed")
Course_E <- c("Not Completed", "Not Completed", "Not Completed")
df_summary <- data.frame(employees, Course_A, Course_B, Course_C, Course_D, Course_E)

I need to update df_summary with the word "Completed" if the completion shows 1 in df_courses and the original text in df_summary is not "Not Applicable".

Do note that some courses in df_courses may not appear in df_summary as they are not essential courses.

Thank you.

CodePudding user response:

You can first pivot_longer your data, then join it with your df_courses, change to "completed where necessary and the pivot_wider again.

library(tidyverse)
df_summary %>%
  pivot_longer(-employees,
               names_to = "course") %>%
  left_join(df_courses, by = c("employees", "course")) %>%
  mutate(value = if_else(completion == 1 & value != "Not Applicable", "Completed", value, missing = value)) %>%
  pivot_wider(names_from = course,
              id_cols = employees)

which gives:

# A tibble: 3 x 6
  employees Course_A       Course_B      Course_C      Course_D      Course_E     
  <chr>     <chr>          <chr>         <chr>         <chr>         <chr>        
1 Apple     Completed      Not Completed Completed     Not Completed Not Completed
2 Betty     Completed      Completed     Not Completed Not Completed Not Completed
3 Carol     Not Applicable Not Completed Not Completed Not Completed Completed 

CodePudding user response:

here is a data.table approach

library(data.table)
# Set the df's to a data.table format
setDT(df_courses)
setDT(df_summary)
# Melt the smmary to long format
ans <- melt(df_summary, id.vars = "employees")
# perform an update join on the molten set
ans[!value == "Not Applicable", 
    value := ans[!value == "Not Applicable", ][df_courses[completion == 1,], 
                                              value := "Completed", 
                                              on = .(employees, variable = course)]$value]
# Cast to wide format again
dcast(ans, employees ~ variable, value.var = "value")
#    employees       Course_A      Course_B      Course_C      Course_D      Course_E
# 1:     Apple      Completed Not Completed     Completed Not Completed Not Completed
# 2:     Betty      Completed     Completed Not Completed Not Completed Not Completed
# 3:     Carol Not Applicable Not Completed Not Completed Not Completed     Completed

CodePudding user response:

And here's a base R option -

#Keep data only for completed courses
complete_course <- subset(df_courses, completion == 1)

#Create a row/column matrix to change value
mat <- cbind(match(complete_course$employees, df_summary$employees), 
             match(complete_course$course, names(df_summary)))
#Drop the 'Not applicable' values
df_summary[mat[df_summary[mat] != "Not Applicable", ]] <- 'Completed'
df_summary

#  employees       Course_A      Course_B      Course_C      Course_D      Course_E
#1     Apple      Completed Not Completed     Completed Not Completed Not Completed
#2     Betty      Completed     Completed Not Completed Not Completed Not Completed
#3     Carol Not Applicable Not Completed Not Completed Not Completed     Completed
  • Related