I have a table A of data.table type which contains a set of elements, I would like to add a new mass column to this table, randomly retrieve from another table B. I would like to do this with the data.table package, but I don't know how to optimze it.
Table A
element
----------
silver
chlorine
silver
chlorine
chlorine
chlorine
silver
Table B
mass element
------
0.3 silver
0.5 silver
1.6 silver
1.2 chlorine
5.3 chlorine
0.1 chlorine
Code to build tables:
tableA <- data.table(
element = c("silver","chlorine","silver", "chlorine","chlorine","chlorine","silver")
)
tableB <- data.table(
mass = c(0.3,0.5,1.6,1.2,5.3,0.1),
element = c("silver","silver","silver", "chlorine","chlorine","chlorine")
)
The volume of data being very important I would like to use on the data.table package
Expected result:
element mass
-----------------------
silver 1.6
chlorine 5.3
silver 0.3
chlorine 1.2
chlorine 1.2
chlorine 0.1
silver 1.6
This code returns me an error that can't fix, but do you think the approach is good and optimized?
tableA[, mass := tableB[sample(mass), on = .(element)]$mass]
CodePudding user response:
Here's one way,
set.seed(42)
tableA[tableB[, list(mass = list(mass)), by = element], mass := sapply(i.mass, sample, size = 1), on = .(element)]
tableA
# element mass
# <char> <num>
# 1: silver 0.3
# 2: chlorine 1.2
# 3: silver 0.3
# 4: chlorine 5.3
# 5: chlorine 5.3
# 6: chlorine 5.3
# 7: silver 0.3
CodePudding user response:
Here is an alternative data.table approach, if you are interested:
tableA[, mass:=sample(tableB[element==.BY$element,mass], .N, replace=T), by=element]
Output:
element mass
1: silver 0.3
2: chlorine 1.2
3: silver 1.6
4: chlorine 5.3
5: chlorine 5.3
6: chlorine 5.3
7: silver 0.5