Why are the column names of a dataframe getting changed automatically?-CodePudding

I wanted to estimate the parameters for a linear regression model in R. The model is of the type: y=(alpha) (beta*x) epsilon. The task required me to place the values of the parameters systematically in a data frame. I thus created a blank data frame, and then went on appending rows into it for the values of the parameters.

df<-data.frame(alpha=double(),beta=double()) #blank dataframe
for(i in 1:1000)
{
    sample_dat<-sampling_model(100,2,5,16,-2,2) #generating 100 samples
    sample_model<-lm(y~x,data=sample_dat) #estimating the linear model
    df<-rbind(df,sample_model$coefficients) #appending the values of the parameters
}

Basically, I have a function sampling_model which is designed such that it generates random values for x_i's and epsilon_i's (both of which follow some distribution) and gives the values of y_i's adding those two with some fixed value of alpha and beta.

In each iteration of the above loop, we get a pair of values of the estimates of the parameters(alpha and beta) upon fitting a linear model to them. I want to store them in a data frame, which I've named df.

Initially (before starting the loop), names(df) returned:

#[1] "alpha" "beta"

However, after appending all those estimates of alpha and beta to df (i.e. after the loop), names(df) returned:

#[1] "X2.4932268478702"  "X5.53432974825338"

I am stuck here, asking myself why is this happening. Better to note that these names are also not constant. Like, if I run the above loop one more time and then check the name of the columns, the numbers are all different. Is something overflowing or have I made some mistake in appending the values to the data frame?

Also, I can (and did) get around this problem of 'ambiguous' names simply by:

names(df)<-c('alpha','beta')

But this does not hide the fact that I made something wrong while appending the estimated parameters in the df and I cannot figure that out. Can anyone help me out on how can this be avoided?

I am also attaching my sampling_model function for convenience:

sampling_model<-function(n,alpha,beta,variance,min_range,max_range)
{
    x<-runif(n,min=min_range,max=max_range) #n uniform variates as x_i
    epsilon<-rnorm(n,mean=0,sd=sqrt(variance)) #n normal variates as epsilon_i
    y<-alpha beta*x epsilon #the dependant variable y
    return(data.frame(x=x,y=y)) #returns dataframe of x and y
}

CodePudding user response：

I'm not sure why this happens, its strange behavior and seems to only happen when the first rbind argument has no rows. But rbinding data frame together in a loop is a very inefficient bad practice and should be avoided. It is famously the 2nd Circle of R Hell in The R Inferno.

The simplest alternative is to initialize your data to the full size, and then fill in each row:

n <- 1000
df <- data.frame(alpha=double(n),beta=double(n)) #blank dataframe
for(i in 1:n)
{
    sample_dat <- sampling_model(100,2,5,16,-2,2) #generating 100 samples
    sample_model <- lm(y~x,data=sample_dat) #estimating the linear model
    df[i, ] <- sample_model$coefficients #filling in the values of the parameters
}