In the working example below, we see that the operations dt[i, j]
and dt[i][,j]
are not equivalent in their output. I would have assumed that chaining did not make a difference, but clearly it does. What is happening under the hood and why is this intended behavior?
library(data.table)
library(lubridate)
# Order A
dates <- as.Date(c("2022-08-18", "2022-10-20"))
db <- data.frame(id = 1:2, d = dates)
dt <- data.table(db)
dt[, month := months(dates)]
print(dt) # month is character
dt[id == 1][, month := lubridate::month(d)]
print(dt) # month remains character
dt[id == 1, month := lubridate::month(d)]
print(dt) # month is numeric
# Order B
dates <- as.Date(c("2022-08-18", "2022-10-20"))
db <- data.frame(id = 1:2, d = dates)
dt <- data.table(db)
dt[, month := months(dates)]
print(dt) # month is character
dt[id == 1, month := lubridate::month(d)]
print(dt) # month is numeric
dt[id == 1][, month := lubridate::month(d)]
print(dt) # month remains numeric
CodePudding user response:
:=
modifies the input object by reference, see reference semantics.
You can observe this using tracemem()
:
tracemem(dt)
#[1] "<000001CEC9999820>"
tracemem(dt[, month := months(dates)])
#[1] "<000001CEC9999820>"
# Same address
However, in dt[id == 1][, month := lubridate::month(d)]
, the input object is dt[id == 1]
, a subset of dt
that has another memory address :
tracemem(dt[id == 1])
[1] "<000001CEC939EBB0>"
# different than dt's address ; changes to this object by reference don't modify dt