Is there a way to specify the current row in R data.table-CodePudding

The code below creates a minimal data.table, shows the approach I am using with a for loop, and prints the desired output.

library(data.table)

# example data.table where "ID" corresponds to the row index
df <- data.table(ID = 1:5, parent_ID = c(0,1,1,3,3), value = paste0(rep("S", 5),1:5))

# unsort the rows and remove one so that ID no longer corresponds to the row index
df2 <- df[c(1, 4,5,3), .(ID, parent_ID, value)]

# this method below works
for(i in 2:nrow(df2))
{
  df2[i, "parent_value"] <- df2[which(df2[,ID] %in%  df2$parent_ID[i]), "value"]
}
df2

Output:

   ID parent_ID value parent_value
1:  1         0    S1         <NA>
2:  4         3    S4           S3
3:  5         3    S5           S3
4:  3         1    S3           S1

My question is if there is a different way to do this in data.table that avoids for loops. My guess is that it would look like the following, but it seems I need a way to reference the current row, thus the title question.


df2[, parent_value := df2[which(df2[,ID] %in% df2$parent_ID[i]), "value"]]

Any ideas appreciated.

CodePudding user response：

Using match:

df2[, parent_value := value[match(parent_ID, ID)]]

   ID parent_ID value parent_value
1:  1         0    S1         <NA>
2:  4         3    S4           S3
3:  5         3    S5           S3
4:  3         1    S3           S1

CodePudding user response：

You can try this:

f <- function(b) df[which(df2$ID %in% df2$parent_ID[b$id])]$value
df2[, parent_value:= f(.BY), by=.(id = 1:nrow(df2))]

Output:

   ID parent_ID value parent_value
1:  1         0    S1         <NA>
2:  4         3    S4           S4
3:  5         3    S5           S4
4:  3         1    S3           S1

You can use also use .I (as suggested in the comments by Severin), like this (magrittr pipe added for presentation clarity only:

df2[,id:=.I] %>% 
  .[, parent_value:=df[which(df2$ID %in% df2$parent_ID[id]), value], by=1:nrow(df2)] %>% 
  .[,id:=NULL] %>% 
  .[]