How to (efficiently) perform Cartesian product on a key subset [R]-CodePudding

Suppose I have these data

data1 <- read.delim(textConnection(
"id val1
1 blue
1 green
1 red
2 black
2 brown
2 white"
), sep=' ')

data2 <- read.delim(textConnection(
"id val2
1 cat
1 dog
1 fish
2 hat
2 coat
2 car"
), sep=' ')

I would like to calculate all permutations of blue, green, and red cat, dog, and fish for id=1 and brown, black, and white hats, coats, and cars for id=2. I could do it in a for loop with expand.grid, and then "build" the output using rbind. But my actual data have several IDs and several vals so it runs poorly.

CodePudding user response：

It turns out that merge does this by default

> merge(data1, data2, by='id')
   id  val1 val2
1   1  blue  cat
2   1  blue  dog
3   1  blue fish
4   1 green  cat
5   1 green  dog
6   1 green fish
7   1   red  cat
8   1   red  dog
9   1   red fish
10  2 black  hat
11  2 black coat
12  2 black  car
13  2 brown  hat
14  2 brown coat
15  2 brown  car
16  2 white  hat
17  2 white coat
18  2 white  car

CodePudding user response：

In base R, we could use split on both the datasets to create a list of values by 'id' and then apply the expand.grid on the corresponding elements of the list and rbind (if needed)

Map(expand.grid, split(data1$val1, data1$id), split(data2$val2, data2$id))

Or in data.table

library(data.table)
setDT(data1)[data2, on = .(id), allow.cartesian = TRUE]