for-loop iteration only appends the first

Y_0 = pd.read_csv('Y_0.txt',header=None) #text file of size 100*10000

but wish to have 10000 iterations of 100 rows, hence:

Y_0 = np.hsplit(Y_0,10000)

X = pd.read_csv('X.txt',header=None) #text file of size 100*9
mu = 0 #normal mean
sigma = 1 #normal variance
n=100
B=10000
np.random.seed(123)

def DW(y,B):
    d_test = []
    for i in range(B): 
        y = y[i]
        numerator_d_hat = 0
        denominator_d_hat = 0
        for i in range(1,n):
            epsilon_hat = np.random.normal(mu, sigma, n)
            #print(epsilon_hat)
            betas = (A.dot(np.transpose(X)))*(epsilon_hat) 
            #print(X.dot(betas))
            epsilon_hat = y - X.dot(betas)
            epsilon_hat = epsilon_hat.iloc[:,0]
            numerator_d_hat  = (epsilon_hat[i] - epsilon_hat[i-1])**2
            denominator_d_hat  = epsilon_hat[i-1]**2
            value_d = numerator_d_hat / denominator_d_hat
        d_test.append(value_d)
    d_test_hats = np.array(d_test)
    d_test_hats.sort()
    return(d_test_hats)
print(DW(y,B))

You see I want to create a sorted array from all DW values, but it only works for the first 1x100 or if I try other methods it either raises a KeyError: 1, invalid index to scalar variable...

I have searched the internet endlessly and tried many combinations to get y such that it works.

the value d, epsilon_hat and betas are all great, If I just do it for y = y[i] (outside of a loop) yet whenever I try to do it like this it does only give errors...

Anyone knows how I can write something like y = y[i] such that all 10000 columns get put in the array d_test_hats, and not just one?

CodePudding user response：

Your problem is most likely the y = y[i] line. Don't reassign your y in the loop. I'm not sure what your y is, I assume it's a multidimensional array. Each time you iterate through your loop you remove one dimension from y.

Short explanation where your code goes wrong:

y = [1,2,3]
for i in range(3):
    y = y[i]
    print(y)

will have the output:

1
TypeError: 'int' object is not subscriptable

Your 'KeyError' say's pretty much the same thing - I guess you have a numpy scalar in hand and therefore your message is different - but the error is the same - you cannot use [] on a number (aka scalar).

What happens here:

First Iteration: y=[1,2,3]; i=0
y is assigned: y=y[i] <=> y=[1,2,3][0]
print: print(y) <=> print(1)
Second Iteration: y=1; i=1
y is assigned: y=y[i] <=> y=1[1] <== TypeError: 'int' is not subscriptable

TL;DR:

Remove the y = y[i] line and

change epsilon_hat = y - X.dot(betas) to epsilon_hat = y[i] - X.dot(betas)

or write y_layer = y[i] and use y_layer in the loop.

CodePudding user response：

Robin Gugel his first answer solved the problem this post asked.

So removing y = y[i] and replacing it by another string, say: y_layer = y[i]. Also replace the y in epsilon_hat = y - X.dot(betas) with y_layer.