I have two dataframes. These are examples of my dataframes:
#dataframe 1
Department <- c("ABS", "ABS", "ABS", "POL", "DOT")
Division <- c("BDO", "BL", "BL", "FSB", "DMS")
Gender <- c("M", "M", "M", "F", "M")
Grade <- c("15", "12","12", "16", "M2")
Salary20 <- c("47", "43", "41", "71", "16")
Overtime20 <- c("17", "43", "63", "0", "0")
df1 <- data.frame(Department, Division, Gender, Grade, Salary20, Overtime20)
df1
#dataframe 2
Department <- c("ABS", "ABS", "ABS", "POL", "HHS")
Division <- c("BDO", "BL", "BL", "FSB", "BHC")
Gender <- c("M", "M", "M", "F","F")
Grade <- c("15", "12","12", "16", "24")
Salary21 <- c("49", "45", "43", "72", "100")
Overtime21 <- c("35", "11", "10", "0", "40")
df2 <- data.frame(Department, Division, Gender, Grade, Salary21, Overtime21)
df2
I'll like to merge or left join df2$Salary21
& df2$Overtime21
to df1
based on Divison Gender Grade
just like the dataframe below:
#Final dataframe
Department <- c("ABS", "ABS", "ABS", "POL", "DOT", "HHS")
Division <- c("BDO", "BL", "BL", "FSB", "DMS", "BHC")
Gender <- c("M", "M", "M", "F", "F", "F")
Grade <- c("15", "12","12", "16", "M2", "24")
Salary20 <- c("47", "43", "41", "71", "16", "0")
Overtime20 <- c("17", "43", "63", "0", "0", "0")
Salary21 <- c("49", "45", "43", "72", "0", "100")
Overtime21 <- c("35", "11", "10", "0", "0", "40")
df <- data.frame(Department, Division, Gender, Grade, Salary20, Overtime20, Salary21, Overtime21)
df
The problem is that there is no unique identifier (UID) and some rows are identical in Divison Gender Grade
(example: ABS rows).
If this possible to merge? If so how?
CodePudding user response:
I do not know of any way to accurately combine these dataframes without a unique identifier - as you mentioned, the identical rows mean that Division, Gender, and Grade are not enough.
I would try to add my own UID. The ease of this depends on how the data is formatted of course. I've included an example below to show how this then makes using merge or join easy.
#dataframe 1
Department <- c("ABS_1", "ABS_2", "ABS_3", "POL_1", "DOT_1")
Division <- c("BDO", "BL", "BL", "FSB", "DMS")
Gender <- c("M", "M", "M", "F", "M")
Grade <- c("15", "12","12", "16", "M2")
Salary20 <- c("47", "43", "41", "71", "16")
Overtime20 <- c("17", "43", "63", "0", "0")
df1 <- data.frame(Department, Division, Gender, Grade, Salary20, Overtime20)
df1
#dataframe 2
Department <- c("ABS_1", "ABS_2", "ABS_3", "POL_1", "HHS_1")
Division <- c("BDO", "BL", "BL", "FSB", "BHC")
Gender <- c("M", "M", "M", "F","F")
Grade <- c("15", "12","12", "16", "24")
Salary21 <- c("49", "45", "43", "72", "100")
Overtime21 <- c("35", "11", "10", "0", "40")
df2 <- data.frame(Department, Division, Gender, Grade, Salary21, Overtime21)
df2
# combine
df <- merge(df1,df2, by = c("Department","Division","Gender","Grade"), all = TRUE)
CodePudding user response:
You can do a common merge. However, for duplicate cases (example: Division BL Gender M Grade 12), R will create duplicate lines for each value it finds in df2, making combinations.
df1$id <- paste0(df1$Division,df1$Gender,df1$Grade)
str(df1)
df2$id <- paste0(df2$Division,df2$Gender,df2$Grade)
str(df2)
merge(df1,df2,by = "id", all.x = TRUE, all.y = FALSE)