I wanted to estimate the parameters for a linear regression model in R
. The model is of the type: y=(alpha) (beta*x) epsilon
. The task required me to place the values of the parameters systematically in a data frame. I thus created a blank data frame, and then went on appending rows into it for the values of the parameters.
df<-data.frame(alpha=double(),beta=double()) #blank dataframe
for(i in 1:1000)
{
sample_dat<-sampling_model(100,2,5,16,-2,2) #generating 100 samples
sample_model<-lm(y~x,data=sample_dat) #estimating the linear model
df<-rbind(df,sample_model$coefficients) #appending the values of the parameters
}
Basically, I have a function sampling_model
which is designed such that it generates random values for x_i
's and epsilon_i
's (both of which follow some distribution) and gives the values of y_i
's adding those two with some fixed value of alpha and beta.
In each iteration of the above loop, we get a pair of values of the estimates of the parameters(alpha and beta) upon fitting a linear model to them. I want to store them in a data frame, which I've named df
.
Initially (before starting the loop), names(df)
returned:
#[1] "alpha" "beta"
However, after appending all those estimates of alpha and beta to df
(i.e. after the loop), names(df)
returned:
#[1] "X2.4932268478702" "X5.53432974825338"
I am stuck here, asking myself why is this happening. Better to note that these names are also not constant. Like, if I run the above loop one more time and then check the name of the columns, the numbers are all different. Is something overflowing or have I made some mistake in appending the values to the data frame?
Also, I can (and did) get around this problem of 'ambiguous' names simply by:
names(df)<-c('alpha','beta')
But this does not hide the fact that I made something wrong while appending the estimated parameters in the df
and I cannot figure that out. Can anyone help me out on how can this be avoided?
I am also attaching my sampling_model
function for convenience:
sampling_model<-function(n,alpha,beta,variance,min_range,max_range)
{
x<-runif(n,min=min_range,max=max_range) #n uniform variates as x_i
epsilon<-rnorm(n,mean=0,sd=sqrt(variance)) #n normal variates as epsilon_i
y<-alpha beta*x epsilon #the dependant variable y
return(data.frame(x=x,y=y)) #returns dataframe of x and y
}
CodePudding user response:
I'm not sure why this happens, its strange behavior and seems to only happen when the first rbind
argument has no rows. But rbind
ing data frame together in a loop is a very inefficient bad practice and should be avoided. It is famously the 2nd Circle of R Hell in The R Inferno.
The simplest alternative is to initialize your data to the full size, and then fill in each row:
n <- 1000
df <- data.frame(alpha=double(n),beta=double(n)) #blank dataframe
for(i in 1:n)
{
sample_dat <- sampling_model(100,2,5,16,-2,2) #generating 100 samples
sample_model <- lm(y~x,data=sample_dat) #estimating the linear model
df[i, ] <- sample_model$coefficients #filling in the values of the parameters
}