Y_0 = pd.read_csv('Y_0.txt',header=None) #text file of size 100*10000
but wish to have 10000 iterations of 100 rows, hence:
Y_0 = np.hsplit(Y_0,10000)
X = pd.read_csv('X.txt',header=None) #text file of size 100*9
mu = 0 #normal mean
sigma = 1 #normal variance
n=100
B=10000
np.random.seed(123)
def DW(y,B):
d_test = []
for i in range(B):
y = y[i]
numerator_d_hat = 0
denominator_d_hat = 0
for i in range(1,n):
epsilon_hat = np.random.normal(mu, sigma, n)
#print(epsilon_hat)
betas = (A.dot(np.transpose(X)))*(epsilon_hat)
#print(X.dot(betas))
epsilon_hat = y - X.dot(betas)
epsilon_hat = epsilon_hat.iloc[:,0]
numerator_d_hat = (epsilon_hat[i] - epsilon_hat[i-1])**2
denominator_d_hat = epsilon_hat[i-1]**2
value_d = numerator_d_hat / denominator_d_hat
d_test.append(value_d)
d_test_hats = np.array(d_test)
d_test_hats.sort()
return(d_test_hats)
print(DW(y,B))
You see I want to create a sorted array from all DW values, but it only works for the first 1x100 or if I try other methods it either raises a KeyError: 1, invalid index to scalar variable...
I have searched the internet endlessly and tried many combinations to get y such that it works.
the value d, epsilon_hat and betas are all great, If I just do it for y = y[i]
(outside of a loop) yet whenever I try to do it like this it does only give errors...
Anyone knows how I can write something like y = y[i]
such that all 10000 columns get put in the array d_test_hats
, and not just one?
CodePudding user response:
Your problem is most likely the y = y[i]
line.
Don't reassign your y
in the loop. I'm not sure what your y
is, I assume it's a multidimensional array. Each time you iterate through your loop you remove one dimension from y
.
Short explanation where your code goes wrong:
y = [1,2,3]
for i in range(3):
y = y[i]
print(y)
will have the output:
1
TypeError: 'int' object is not subscriptable
Your 'KeyError' say's pretty much the same thing - I guess you have a numpy scalar in hand and therefore your message is different - but the error is the same - you cannot use []
on a number (aka scalar).
What happens here:
- First Iteration:
y=[1,2,3]; i=0
y
is assigned:y=y[i] <=> y=[1,2,3][0]
- print:
print(y) <=> print(1)
- Second Iteration:
y=1; i=1
y
is assigned:y=y[i] <=> y=1[1] <== TypeError: 'int' is not subscriptable
TL;DR:
Remove the y = y[i]
line and
change epsilon_hat = y - X.dot(betas)
to epsilon_hat = y[i] - X.dot(betas)
or write y_layer = y[i]
and use y_layer
in the loop.
CodePudding user response:
Robin Gugel his first answer solved the problem this post asked.
So removing y = y[i] and replacing it by another string, say: y_layer = y[i]. Also replace the y in epsilon_hat = y - X.dot(betas) with y_layer.