The code below creates a minimal data.table, shows the approach I am using with a for loop, and prints the desired output.
library(data.table)
# example data.table where "ID" corresponds to the row index
df <- data.table(ID = 1:5, parent_ID = c(0,1,1,3,3), value = paste0(rep("S", 5),1:5))
# unsort the rows and remove one so that ID no longer corresponds to the row index
df2 <- df[c(1, 4,5,3), .(ID, parent_ID, value)]
# this method below works
for(i in 2:nrow(df2))
{
df2[i, "parent_value"] <- df2[which(df2[,ID] %in% df2$parent_ID[i]), "value"]
}
df2
Output:
ID parent_ID value parent_value
1: 1 0 S1 <NA>
2: 4 3 S4 S3
3: 5 3 S5 S3
4: 3 1 S3 S1
My question is if there is a different way to do this in data.table that avoids for loops. My guess is that it would look like the following, but it seems I need a way to reference the current row, thus the title question.
df2[, parent_value := df2[which(df2[,ID] %in% df2$parent_ID[i]), "value"]]
Any ideas appreciated.
CodePudding user response:
Using match
:
df2[, parent_value := value[match(parent_ID, ID)]]
ID parent_ID value parent_value
1: 1 0 S1 <NA>
2: 4 3 S4 S3
3: 5 3 S5 S3
4: 3 1 S3 S1
CodePudding user response:
You can try this:
f <- function(b) df[which(df2$ID %in% df2$parent_ID[b$id])]$value
df2[, parent_value:= f(.BY), by=.(id = 1:nrow(df2))]
Output:
ID parent_ID value parent_value
1: 1 0 S1 <NA>
2: 4 3 S4 S4
3: 5 3 S5 S4
4: 3 1 S3 S1
You can use also use .I (as suggested in the comments by Severin), like this (magrittr
pipe added for presentation clarity only:
df2[,id:=.I] %>%
.[, parent_value:=df[which(df2$ID %in% df2$parent_ID[id]), value], by=1:nrow(df2)] %>%
.[,id:=NULL] %>%
.[]