I'm trying to get a list of column names that have been added after the initial csv load. If I am not updating the variable after column names are added, then how are they being added to the variable?
I would expect that only Name and Age would get printed from my_cols but it is printing IsJon as well
library(data.table)
Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
df <- data.table(Name, Age)
my_cols <- colnames(df)
print(my_cols)
df[,isJon:=ifelse(Name=="John", 1, 0)]
print(my_cols)
CodePudding user response:
There are at least two things going on here:
R is inherently lazy with objects, and when you create
my_cols <- colnames(df)
, it isn't changing anything so it does not create a duplicate vector of names. The moment you do something to the vector of names that "could" be changing it, R copies the vector from the frame's attributes and creates a new one, thereby not changing when the original frame is updated.data.table
tends to do things in-place with its referential semantics, so when it adds a column, the internal storage of column names is appended in-place, contrary to R's normal way of doing things. Normally,data.frame
changes creates a new vector of names when you add one.C.f.,
base::data.frame
, adding a column creates a new vector of column names, therefore ourmy_cols
does not magically stay updated:Name <- c("Jon", "Bill", "Maria", "Ben", "Tina") Age <- c(23, 41, 32, 58, 26) df <- data.frame(Name, Age) my_cols <- colnames(df) print(my_cols) # [1] "Name" "Age" df <- transform(df, isJon=ifelse(Name=="John", 1, 0)) print(my_cols) # [1] "Name" "Age"
There a couple of ways you can get these two things to work in the direction you were heading:
copy
the vector, which forces it to be a new copy (yes, good name) of the vector.Name <- c("Jon", "Bill", "Maria", "Ben", "Tina") Age <- c(23, 41, 32, 58, 26) df <- data.table(Name, Age) my_cols <- copy(colnames(df)) print(my_cols) # [1] "Name" "Age" df[,isJon:=ifelse(Name=="John", 1, 0)] print(my_cols) # [1] "Name" "Age"
Do "something" to the vector, making R think it should copy-on-write:
Name <- c("Jon", "Bill", "Maria", "Ben", "Tina") Age <- c(23, 41, 32, 58, 26) df <- data.table(Name, Age) my_cols <- colnames(df)[] print(my_cols) # [1] "Name" "Age" df[,isJon:=ifelse(Name=="John", 1, 0)] print(my_cols) # [1] "Name" "Age"