I wrote following R function to make dummy variables.
For example, I made a dataset(dt, only a variable "var"), and used this function to creat a new variable ("dummy"), which is the quartile dummy variable of "var". However, the dt dataset has only a variable "var" after running the function, the new varibale could not be add to the dataset(dt).
How could I add the new varibale from R function to the dataset? Or It is not a good idea to creat new varibale by R function?
dv <- function(dummy,variable,n){
nn <- n - 1
dummy <- cut(variable,
quantile(variable,
probs = seq(0, 1, 1/n),
na.rm = TRUE
),
labels = c(0:nn),
include.lowest = TRUE
)
tapply(variable, dummy, summary)
}
set.seed(1234)
dt <- data.table(var = runif(20, min = 0, max = 100) )
dv(dt$dummy,dt$var,4)
CodePudding user response:
If I understand correctly, I think this is what you want to do.
(note that I edited your function)
library(data.table)
dv <- function(variable,n){
nn <- n - 1
dummy <- cut(variable,
quantile(variable,
probs = seq(0, 1, 1/n),
na.rm = TRUE
),
labels = c(0:nn),
include.lowest = TRUE
)
return(dummy)
}
set.seed(1234)
dt <- data.table(var = runif(20, min = 0, max = 100) )
dt[,dummy:=dv(dt$var,4)]
> dt
var dummy
1: 11.3703411 0
2: 62.2299405 2
3: 60.9274733 2
4: 62.3379442 2
5: 86.0915384 3
6: 64.0310605 2
7: 0.9495756 0
8: 23.2550506 0
9: 66.6083758 3
10: 51.4251141 1
11: 69.3591292 3
12: 54.4974836 2
13: 28.2733584 1
14: 92.3433484 3
15: 29.2315840 1
16: 83.7295628 3
17: 28.6223285 1
18: 26.6820780 1
19: 18.6722790 0
20: 23.2225911 0
CodePudding user response:
dt$newColname<-dv(dt$dummy,dt$var,4)