I have data as follows:
library(data.table)
all_questions <- fread("Variable_codes_2022 Variables_2022
Cat1_1 This_question
Cat1_2 Other_question
Cat2_1 One_question
Cat2_2 Another_question
Cat3_1 Some_question
Cat3_2 Extra_question
Cat3_3 This_question
Cat4_1 One_question
Cat4_2 Wrong_question")
# I have other datasets that have the same questions with a key:
dat1 <- fread("Variable_codes Variables Key
Cat1 This_question A1
Cat1 Other_question B3")
dat2 <- fread("Variable_codes Variables Key
Cat2 One_question A7
Cat2 Another_question C8")
I would like add the matching keys from dat1
and dat2
to all_questions
, which I tried to do as follows:
all_questions <- merge(all_questions, dat1, by.x="Variables_2022", by.y="Variables", all.x=TRUE)
Variables_2022 Variable_codes_2022 Variable_codes Key
1: Another_question Cat2_2 <NA> <NA>
2: Extra_question Cat3_2 <NA> <NA>
3: One_question Cat2_1 <NA> <NA>
4: One_question Cat4_1 <NA> <NA>
5: Other_question Cat1_2 Cat1 B3
6: Some_question Cat3_1 <NA> <NA>
7: This_question Cat1_1 Cat1 A1
8: This_question Cat3_3 Cat1 A1
9: Wrong_question Cat4_2 <NA> <NA>
The problem is that when I do it in this manner, when I try to merge the second data frame, the keys get spread over multiple columns with suffixes:
all_questions <- merge(all_questions, dat2, by.x="Variables_2022", by.y="Variables", all.x=TRUE)
Variables_2022 Variable_codes_2022 Variable_codes.x Key.x Variable_codes.y Key.y
1: Another_question Cat2_2 <NA> <NA> Cat2 C8
2: Extra_question Cat3_2 <NA> <NA> <NA> <NA>
3: One_question Cat2_1 <NA> <NA> Cat2 A7
4: One_question Cat4_1 <NA> <NA> Cat2 A7
5: Other_question Cat1_2 Cat1 B3 <NA> <NA>
6: Some_question Cat3_1 <NA> <NA> <NA> <NA>
7: This_question Cat1_1 Cat1 A1 <NA> <NA>
8: This_question Cat3_3 Cat1 A1 <NA> <NA>
9: Wrong_question Cat4_2 <NA> <NA> <NA> <NA>
How can I add the Key
s of dat1
and dat2
to all_questions
, in a way that this does not happen?
CodePudding user response:
We may do a join
library(data.table)
all_questions[dat1, Key1 := i.Key, on = .(Variables_2022 = Variables)]
all_questions[dat2, Key2 := i.Key, on = .(Variables_2022 = Variables)]
-output
> all_questions
Variable_codes_2022 Variables_2022 Key1 Key2
<char> <char> <char> <char>
1: Cat1_1 This_question A1 <NA>
2: Cat1_2 Other_question B3 <NA>
3: Cat2_1 One_question <NA> A7
4: Cat2_2 Another_question <NA> C8
5: Cat3_1 Some_question <NA> <NA>
6: Cat3_2 Extra_question <NA> <NA>
7: Cat3_3 This_question A1 <NA>
8: Cat4_1 One_question <NA> A7
9: Cat4_2 Wrong_question <NA> <NA>
If there are multiple datasets, do it in a loop and if it needs only a single 'Key' column
lst1 <- list(dat1, dat2)
all_questions[, Key := NA_character_]
for(i in seq_along(lst1)) all_questions[lst1[[i]],
Key := fcoalesce(i.Key, Key), on = .(Variables_2022 = Variables)]
-output
> all_questions
Variable_codes_2022 Variables_2022 Key
<char> <char> <char>
1: Cat1_1 This_question A1
2: Cat1_2 Other_question B3
3: Cat2_1 One_question A7
4: Cat2_2 Another_question C8
5: Cat3_1 Some_question <NA>
6: Cat3_2 Extra_question <NA>
7: Cat3_3 This_question A1
8: Cat4_1 One_question A7
9: Cat4_2 Wrong_question <NA>