library(data.table)
table1 <- data.frame(id1 = c(1324, 7822, 2324, 29, 9999, 1010),
class = c(1, 1, 1, 2, 2, 1),
type = c("A", "A", "A", "B", "C", "D"),
number = c(1, 2.5, 98, 100, 80, 50))
table2 <- data.frame(id2 = c(1992, 1987, 1998, 1998, 2000, 2000, 2000, 2010, 2012),
class = c(3, 3, 1, 1, 3, 1, 2, 5, 1),
type = c("B", "C", "D", "A", "D", "D", "C", "B", "A"),
min_number = c(0, 0, 34, 0, 20, 45, 5, 23, 1),
max_number = c(18, 18, 50, 100, 100, 100, 100, 9, 10))
I have 2 tables that look like this:
> table1
id1 class type number
1 1324 1 A 1.0
2 7822 1 A 2.5
3 2324 1 A 98.0
4 29 2 B 100.0
5 9999 2 C 80.0
6 1010 1 D 50.0
> table2
id2 class type min_number max_number
1 1992 3 B 0 18
2 1987 3 C 0 18
3 1998 1 D 34 50
4 1998 1 A 0 100
5 2000 3 D 20 100
6 2000 1 D 45 100
7 2000 2 C 5 100
8 2010 5 B 23 9
9 2012 1 A 1 10
I would like to merge the two tables based on the same class
and type
. In particular, I would like to extract the id2
s from table2
and store them in tab1_tab2
.
tab1_tab2 <- setDT(table1)[setDT(table2), on = c("class", "type"),
id2 := id2]
> tab1_tab2
id1 class type number id2
1: 1324 1 A 1.0 2012
2: 7822 1 A 2.5 2012
3: 2324 1 A 98.0 2012
4: 29 2 B 100.0 NA
5: 9999 2 C 80.0 2000
6: 1010 1 D 50.0 2000
Notice that for class=1
and type=A
, there are actually 2 id2
s: 1998
and 2012
. However, tab1_tab2
only stored 2012
.
I would like the final output to store all possible id2
s, each separated by ;
. It should look something like this:
> final_tab1_tab2
id1 class type number id2
1: 1324 1 A 1.0 1998;2012
2: 7822 1 A 2.5 1998;2012
3: 2324 1 A 98.0 1998;2012
4: 29 2 B 100.0 NA
5: 9999 2 C 80.0 2000
6: 1010 1 D 50.0 1998;2000
This is a toy example, so the tables are small. In reality, my tables have thousands of rows, so I want to avoid using a for loop.
CodePudding user response:
Another option:
setDT(table2)[setDT(table1),.(class,type,number,id1,id2=x.id2),on=("class", "type")][
,.(id2=.(id2)),by=.(class,type,id1,number)]
# class type id1 number id2
# <num> <char> <num> <num> <list>
# 1: 1 A 1324 1.0 1998,2012
# 2: 1 A 7822 2.5 1998,2012
# 3: 1 A 2324 98.0 1998,2012
# 4: 2 B 29 100.0 NA
# 5: 2 C 9999 80.0 2000
# 6: 1 D 1010 50.0 1998,2000
CodePudding user response:
You can do
table1[, id2 := table2[, .(id2 = paste(id2, collapse = ";")), by = .(class, type)][table1, on = .(class, type)]$id2]
# <-- 1) aggreagtion -->
# <-- 2) left-join -->
# 3) extract
# <-- 4) assignment by reference -->
Result
table1
# id1 class type number id2
#1: 1324 1 A 1.0 1998;2012
#2: 7822 1 A 2.5 1998;2012
#3: 2324 1 A 98.0 1998;2012
#4: 29 2 B 100.0 <NA>
#5: 9999 2 C 80.0 2000
#6: 1010 1 D 50.0 1998;2000
Above we
- aggregate
table2
, then - perform a left-join,
- extract new
id2
from the result of 2. - and assign
id2
by reference totable1
data
library(data.table)
table1 <- data.table(id1 = c(1324, 7822, 2324, 29, 9999, 1010),
class = c(1, 1, 1, 2, 2, 1),
type = c("A", "A", "A", "B", "C", "D"),
number = c(1, 2.5, 98, 100, 80, 50))
table2 <- data.table(id2 = c(1992, 1987, 1998, 1998, 2000, 2000, 2000, 2010, 2012),
class = c(3, 3, 1, 1, 3, 1, 2, 5, 1),
type = c("B", "C", "D", "A", "D", "D", "C", "B", "A"),
min_number = c(0, 0, 34, 0, 20, 45, 5, 23, 1),
max_number = c(18, 18, 50, 100, 100, 100, 100, 9, 10))