Home > OS >  data.table: Different result whether assignment is chained or within same square bracket as subsetti
data.table: Different result whether assignment is chained or within same square bracket as subsetti

Time:10-13

In the working example below, we see that the operations dt[i, j] and dt[i][,j] are not equivalent in their output. I would have assumed that chaining did not make a difference, but clearly it does. What is happening under the hood and why is this intended behavior?

library(data.table)
library(lubridate)

  # Order A
dates <- as.Date(c("2022-08-18", "2022-10-20"))
db <- data.frame(id = 1:2, d = dates)
dt <- data.table(db)

dt[, month := months(dates)]
print(dt) # month is character

dt[id == 1][, month := lubridate::month(d)]
print(dt) # month remains character

dt[id == 1, month := lubridate::month(d)]
print(dt) # month is numeric



  # Order B
dates <- as.Date(c("2022-08-18", "2022-10-20"))
db <- data.frame(id = 1:2, d = dates)
dt <- data.table(db)

dt[, month := months(dates)]
print(dt) # month is character

dt[id == 1, month := lubridate::month(d)]
print(dt) # month is numeric

dt[id == 1][, month := lubridate::month(d)]
print(dt) # month remains numeric

CodePudding user response:

:= modifies the input object by reference, see reference semantics.

You can observe this using tracemem():

tracemem(dt)
#[1] "<000001CEC9999820>"
tracemem(dt[, month := months(dates)])
#[1] "<000001CEC9999820>"
# Same address

However, in dt[id == 1][, month := lubridate::month(d)], the input object is dt[id == 1], a subset of dt that has another memory address :

tracemem(dt[id == 1])
[1] "<000001CEC939EBB0>"
# different than dt's address ; changes to this object by reference don't modify dt

  • Related