I have a dataframe to show the completion of course by members.
employees <- c("Apple", "Apple", "Apple", "Betty", "Betty", "Carol", "Carol", "Carol")
course <- c("Course_A", "Course_F", "Course_C", "Course_B", "Course_A", "Course_D", "Course_E", "Course_A")
completion <- c(1, 0, 1, 1, 1, 0, 1, 1)
df_courses <- data.frame(employees, course, completion)
I have another dataframe which summarises the completion course of members
employees <- c("Apple", "Betty", "Carol")
Course_A <- c("Not Completed", "Not Completed", "Not Applicable")
Course_B <- c("Not Completed", "Not Completed", "Not Completed")
Course_C <- c("Not Completed", "Not Completed", "Not Completed")
Course_D <- c("Not Completed", "Not Completed", "Not Completed")
Course_E <- c("Not Completed", "Not Completed", "Not Completed")
df_summary <- data.frame(employees, Course_A, Course_B, Course_C, Course_D, Course_E)
I need to update df_summary with the word "Completed" if the completion shows 1 in df_courses and the original text in df_summary is not "Not Applicable".
Do note that some courses in df_courses may not appear in df_summary as they are not essential courses.
Thank you.
CodePudding user response:
You can first pivot_longer
your data, then join it with your df_courses
, change to "completed where necessary and the pivot_wider
again.
library(tidyverse)
df_summary %>%
pivot_longer(-employees,
names_to = "course") %>%
left_join(df_courses, by = c("employees", "course")) %>%
mutate(value = if_else(completion == 1 & value != "Not Applicable", "Completed", value, missing = value)) %>%
pivot_wider(names_from = course,
id_cols = employees)
which gives:
# A tibble: 3 x 6
employees Course_A Course_B Course_C Course_D Course_E
<chr> <chr> <chr> <chr> <chr> <chr>
1 Apple Completed Not Completed Completed Not Completed Not Completed
2 Betty Completed Completed Not Completed Not Completed Not Completed
3 Carol Not Applicable Not Completed Not Completed Not Completed Completed
CodePudding user response:
here is a data.table
approach
library(data.table)
# Set the df's to a data.table format
setDT(df_courses)
setDT(df_summary)
# Melt the smmary to long format
ans <- melt(df_summary, id.vars = "employees")
# perform an update join on the molten set
ans[!value == "Not Applicable",
value := ans[!value == "Not Applicable", ][df_courses[completion == 1,],
value := "Completed",
on = .(employees, variable = course)]$value]
# Cast to wide format again
dcast(ans, employees ~ variable, value.var = "value")
# employees Course_A Course_B Course_C Course_D Course_E
# 1: Apple Completed Not Completed Completed Not Completed Not Completed
# 2: Betty Completed Completed Not Completed Not Completed Not Completed
# 3: Carol Not Applicable Not Completed Not Completed Not Completed Completed
CodePudding user response:
And here's a base R option -
#Keep data only for completed courses
complete_course <- subset(df_courses, completion == 1)
#Create a row/column matrix to change value
mat <- cbind(match(complete_course$employees, df_summary$employees),
match(complete_course$course, names(df_summary)))
#Drop the 'Not applicable' values
df_summary[mat[df_summary[mat] != "Not Applicable", ]] <- 'Completed'
df_summary
# employees Course_A Course_B Course_C Course_D Course_E
#1 Apple Completed Not Completed Completed Not Completed Not Completed
#2 Betty Completed Completed Not Completed Not Completed Not Completed
#3 Carol Not Applicable Not Completed Not Completed Not Completed Completed