I am a quite unexperienced R user facing the following problem:
I would like to merge two data tables dt1 and dt2. dt1 contains 1 variable entitled Assessment. dt2 contains 2 variables entitled ID and Frequency. Now, I would like to have also the Assessment observations in dt2.
For simplicity, consider this example:
library(dplyr)
library(data.table)
dt1 <- data.table(c("perfect", "perfect", "okay", "unsufficient", "good", "good", "okay", "perfect"))
colnames(dt1) <- "Assessment"
dt2 <- data.table(cbind(c(1,2,3,4,5,6),c(1,3,1,1,1,1)))
colnames(dt2) <- c("ID", "Frequency")
Hence, dt1 looks like that:
Assessment |
---|
perfect |
perfect |
okay |
unsufficient |
good |
good |
okay |
perfect |
dt2 looks like that:
ID | Frequency |
---|---|
1 | 1 |
2 | 3 |
3 | 1 |
4 | 1 |
5 | 1 |
6 | 1 |
My aim would be to get something like:
ID | Frequency | Assessment |
---|---|---|
1 | 1 | perfect |
2 | 3 | perfect;okay;unsufficient |
3 | 1 | good |
4 | 1 | good |
5 | 1 | okay |
6 | 1 | perfect |
I do not have any idea how to come here and would appreciate each help very much! Thanks a lot!
CodePudding user response:
dt1 %>%
bind_cols(
dt2 %>%
uncount(Frequency)
) %>%
group_by(ID) %>%
summarise(Assessment = paste0(Assessment,collapse = ";"))
# A tibble: 6 x 2
ID Assessment
<dbl> <chr>
1 1 perfect
2 2 perfect;okay;unsufficient
3 3 good
4 4 good
5 5 okay
6 6 perfect
CodePudding user response:
If you trust the right order, as you say in OP, you can rep.int
the IDs according to their frequencies.
dt2[dt1[, list(Assessment=toString(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))], on=.(ID)]
# ID Frequency Assessment
# 1: 1 1 perfect
# 2: 2 3 perfect, okay, unsufficient
# 3: 3 1 good
# 4: 4 1 good
# 5: 5 1 okay
# 6: 6 1 perfect
or
dt2[dt1[, list(Assessment=list(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))], on=.(ID)]
# ID Frequency Assessment
# 1: 1 1 perfect
# 2: 2 3 perfect,okay,unsufficient
# 3: 3 1 good
# 4: 4 1 good
# 5: 5 1 okay
# 6: 6 1 perfect
The difference is, in second version Assessment
is a list column.
Note: if dt2
doesn't contain anything else, there's no need to merge anymore and it simplifies to
dt1[, list(Assessment=toString(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))]
# ID Assessment
# 1: 1 perfect
# 2: 2 perfect, okay, unsufficient
# 3: 3 good
# 4: 4 good
# 5: 5 okay
# 6: 6 perfect