Home > Blockchain >  How to (efficiently) perform Cartesian product on a key subset [R]
How to (efficiently) perform Cartesian product on a key subset [R]


Suppose I have these data

data1 <- read.delim(textConnection(
"id val1
1 blue
1 green
1 red
2 black
2 brown
2 white"
), sep=' ')

data2 <- read.delim(textConnection(
"id val2
1 cat
1 dog
1 fish
2 hat
2 coat
2 car"
), sep=' ')

I would like to calculate all permutations of blue, green, and red cat, dog, and fish for id=1 and brown, black, and white hats, coats, and cars for id=2. I could do it in a for loop with expand.grid, and then "build" the output using rbind. But my actual data have several IDs and several vals so it runs poorly.

CodePudding user response:

It turns out that merge does this by default

> merge(data1, data2, by='id')
   id  val1 val2
1   1  blue  cat
2   1  blue  dog
3   1  blue fish
4   1 green  cat
5   1 green  dog
6   1 green fish
7   1   red  cat
8   1   red  dog
9   1   red fish
10  2 black  hat
11  2 black coat
12  2 black  car
13  2 brown  hat
14  2 brown coat
15  2 brown  car
16  2 white  hat
17  2 white coat
18  2 white  car

CodePudding user response:

In base R, we could use split on both the datasets to create a list of values by 'id' and then apply the expand.grid on the corresponding elements of the list and rbind (if needed)

Map(expand.grid, split(data1$val1, data1$id), split(data2$val2, data2$id))

Or in data.table

setDT(data1)[data2, on = .(id), allow.cartesian = TRUE]
  •  Tags:  
  • r
  • Related