Home > Software engineering >  R Data Table add rows to each group if not existing
R Data Table add rows to each group if not existing

Time:01-20

I have a data table with multiple groups. Each group I'd like to fill with rows containing the values in vals if they are not already present. Additional columns should be filled with NAs.

DT = data.table(group = c(1,1,1,2,2,3,3,3,3), val = c(1,2,4,2,3,1,2,3,4), somethingElse = rep(1,9)) 
vals = data.table(val = c(1,2,3,4))

What I want:

    group val somethingElse
 1:     1   1             1
 2:     1   2             1
 3:     1   3            NA
 4:     1   4             1
 5:     2   1            NA
 6:     2   2             1
 7:     2   3             1
 8:     2   4            NA
 9:     3   1             1
10:     3   2             1
11:     3   3             1
12:     3   4             1

The order of val does not necessarily have to be increasing, the values may also be appened at the beginning/end of each group.

I don't know how to approach this problem. I've thought about using rbindlist(...,fill = TRUE), but then the values will be simply appended. I think some expression with DT[, lapply(...), by = c("group")] might be useful here but I have no idea how to check if a value already exists.

CodePudding user response:

You can use a cross-join:

setDT(DT)[
  CJ(group = group, val = val, unique = TRUE), 
  on = .(group, val)
]

    group val somethingElse
 1:     1   1             1
 2:     1   2             1
 3:     1   3            NA
 4:     1   4             1
 5:     2   1            NA
 6:     2   2             1
 7:     2   3             1
 8:     2   4            NA
 9:     3   1             1
10:     3   2             1
11:     3   3             1
12:     3   4             1

CodePudding user response:

I will just add this answer for a slightly more complex case:

#Raw Data
DT = data.table(group = c(1,1,2,2,2,3,3,3,3),
                  x = c(1,2,1,3,4,1,2,3,4),
                  y = c(2,4,2,6,8,2,4,6,8),
                  somethingElse = rep(1,9))

#allowed combinations of x and y
DTxy = data.table(x = c(1,2,3,4), y = c(2,4,6,8))

Here, I want to add all x,y combinations from DTxy to each group from DT, if not already present.

I've wrote a function to work for subsets.

#function to join subsets on two columns (here: x,y)
DTxyJoin = function(.SD, xy){
  .SD = .SD[xy, on = .(x,y)]
  return(.SD)
}

I then applied the function to each group:

#add x and y to each group if missing
DTres = DT[, DTxyJoin(.SD, DTxy), by = c("group")]

The Result:

    group x y somethingElse
 1:     1 1 2             1
 2:     1 2 4             1
 3:     1 3 6            NA
 4:     1 4 8            NA
 5:     2 1 2             1
 6:     2 2 4            NA
 7:     2 3 6             1
 8:     2 4 8             1
 9:     3 1 2             1
10:     3 2 4             1
11:     3 3 6             1
12:     3 4 8             1
  • Related