A "special" case of merging in R-CodePudding

I am a quite unexperienced R user facing the following problem:

I would like to merge two data tables dt1 and dt2. dt1 contains 1 variable entitled Assessment. dt2 contains 2 variables entitled ID and Frequency. Now, I would like to have also the Assessment observations in dt2.

For simplicity, consider this example:

library(dplyr)
library(data.table)

dt1 <- data.table(c("perfect", "perfect", "okay", "unsufficient", "good", "good", "okay", "perfect"))
colnames(dt1) <- "Assessment"

dt2 <- data.table(cbind(c(1,2,3,4,5,6),c(1,3,1,1,1,1)))
colnames(dt2) <- c("ID", "Frequency")

Hence, dt1 looks like that:

Assessment
perfect
perfect
okay
unsufficient
good
good
okay
perfect

dt2 looks like that:

ID	Frequency
1	1
2	3
3	1
4	1
5	1
6	1

My aim would be to get something like:

ID	Frequency	Assessment
1	1	perfect
2	3	perfect;okay;unsufficient
3	1	good
4	1	good
5	1	okay
6	1	perfect

I do not have any idea how to come here and would appreciate each help very much! Thanks a lot!

CodePudding user response：

dt1 %>% 
  bind_cols(
    dt2 %>% 
      uncount(Frequency) 
  ) %>% 
  group_by(ID) %>% 
  summarise(Assessment = paste0(Assessment,collapse = ";"))

# A tibble: 6 x 2
     ID Assessment               
  <dbl> <chr>                    
1     1 perfect                  
2     2 perfect;okay;unsufficient
3     3 good                     
4     4 good                     
5     5 okay                     
6     6 perfect

CodePudding user response：

If you trust the right order, as you say in OP, you can rep.int the IDs according to their frequencies.

dt2[dt1[, list(Assessment=toString(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))], on=.(ID)]
#    ID Frequency                  Assessment
# 1:  1         1                     perfect
# 2:  2         3 perfect, okay, unsufficient
# 3:  3         1                        good
# 4:  4         1                        good
# 5:  5         1                        okay
# 6:  6         1                     perfect

dt2[dt1[, list(Assessment=list(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))], on=.(ID)]
#    ID Frequency                Assessment
# 1:  1         1                   perfect
# 2:  2         3 perfect,okay,unsufficient
# 3:  3         1                      good
# 4:  4         1                      good
# 5:  5         1                      okay
# 6:  6         1                   perfect

The difference is, in second version Assessment is a list column.

Note: if dt2 doesn't contain anything else, there's no need to merge anymore and it simplifies to

dt1[, list(Assessment=toString(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))]
#    ID                  Assessment
# 1:  1                     perfect
# 2:  2 perfect, okay, unsufficient
# 3:  3                        good
# 4:  4                        good
# 5:  5                        okay
# 6:  6                     perfect