I currently have two .csv files that look like this:
File 1:
Attempt | Result |
---|---|
Intervention 1 | B |
Intervention 2 | H |
and File 2:
Name | Outcome 1 | Outcome 2 | Outcome 3 |
---|---|---|---|
Sample 1 | A | B | C |
Sample 2 | D | E | F |
Sample 3 | G | H | I |
I would like to merge and align the two .csvs such that the result each row of File 1 is aligned by its "result" cell, against any of the three "outcome" columns in File 2, leaving blanks or "NA"s if there are no similarities.
Ideally, would look like this:
Attempt | Result | Name | Outcome 1 | Outcome 2 | Outcome 3 |
---|---|---|---|---|---|
Intervention 1 | B | Sample 1 | A | B | C |
Sample 2 | D | E | F | ||
Intervention 2 | H | Sample 3 | G | H | I |
I've looked and only found answers when merging two .csv files with one common column. Any help would be very appreciated.
CodePudding user response:
I will assume that " Result " in File 1 is unique, since more File 1 rows with same result value (i.e "B") will force us to consider new columns in the final data frame.
By this way,
Attempt <- c("Intervention 1","Intervention 2")
Result <- c("B","H")
df1 <- as.data.frame(cbind(Attempt,Result))
one <- c("Sample 1","A","B","C")
two <- c("Sample 2","D","E","F")
three <- c("Sample 3","G","H","I")
df2 <- as.data.frame(rbind(one,two,three))
row.names(df2) <- 1:3
colnames(df2) <- c("Name","Outcome 1","Outcome 2","Outcome 3")
vec_at <- rep(NA,nrow(df2));vec_res <- rep(NA,nrow(df2)); # Define NA vectors
for (j in 1:nrow(df2)){
a <- which(is.element(df1$Result,df2[j,2:4])==TRUE) # Row names which satisfy same element in two dataframes?
if (length(a>=1)){ # Don't forget that "a" may not be a valid index if no element satify the condition
vec_at[j] <- df1$Attempt[a] #just create a vector with wanted information
vec_res[j] <- df1$Result[a]
}
}
desired_df <- as.data.frame(cbind(vec_at,vec_res,df2)) # define your wanted data frame
Output:
vec_at vec_res Name Outcome 1 Outcome 2 Outcome 3
1 Intervention 1 B Sample 1 A B C
2 <NA> <NA> Sample 2 D E F
3 Intervention 2 H Sample 3 G H I
CodePudding user response:
I wonder if you could use fuzzyjoin
for something like this.
Here, you can provide a custom function for matching between the two data.frames.
library(fuzzyjoin)
fuzzy_left_join(
df2,
df1,
match_fun = NULL,
multi_by = list(x = paste0("Outcome_", 1:3), y = "Result"),
multi_match_fun = function(x, y) {
y == x[, "Outcome_1"] | y == x[, "Outcome_2"] | y == x[, "Outcome_3"]
}
)
Output
Name Outcome_1 Outcome_2 Outcome_3 Attempt Result
1 Sample_1 A B C Intervention_1 B
2 Sample_2 D E F <NA> <NA>
3 Sample_3 G H I Intervention_2 H