In Short ->
How to write this expression correctly?? ->
[(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]
A Minimal Reproducable Example of this problem is , to generate the same error is :
from sklearn.model_selection import train_test_split
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
import time
X,y = datasets.make_classification(n_samples=1000,n_classes = 2, n_features=10,random_state= 1234)
Classes = [0,1,2,3,4,5,6,7,8,9]
[[_mean, _var]] = [[ (np.mean(X[i==c]),np.var(X[i==c])) for c in Classes ] for i in range(len(X)) ]
print(_mean)
print(_var)
with the error stack as :
/bin/python3 "/home/vivek/Documents/GitHub/ML-Coding-Playground/LecturesSeries1/Lecture 5 - Naive Bayes/CodeSample.py" ─╯
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3757: RuntimeWarning: Degrees of freedom <= 0 for slice
return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/_methods.py:222: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/_methods.py:256: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/home/vivek/Documents/GitHub/ML-Coding-Playground/LecturesSeries1/Lecture 5 - Naive Bayes/CodeSample.py", line 12, in <module>
[[_mean, _var]] = [[ (np.mean(X[i==c]),np.var(X[i==c])) for c in Classes ] for i in range(len(X)) ]
ValueError: too many values to unpack (expected 1)
Context for the Line of code :
I am running a naive bayesian classifier from scratch, and have written the following script to run my code :
#script.py
#
#
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
import time
from NaiveBayes import *
def accuracy (y_true, y_pred):
accuracy=np.sum(y_true==y_pred)/len(y_true)
return accuracy
X,y = datasets.make_classification(n_samples=1000,n_classes = 2, n_features=10,random_state= 1234)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=69420)
nb=NaiveBayes()
nb.fit(X_train,y_train)
y_pred=nb.predict(X_test)
print("Accuracy:",accuracy(y_test,y_pred))
print( "Confusion Matrix:")
print( np.array([[np.sum(y_test==0),np.sum(y_test==1)],[np.sum(y_pred==0),np.sum(y_pred==1)]]))
I have made a few attempts at the code for my naive bayes classifier,
- With use of for loop (working)
#NaiveBayes.py
#
#
import numpy as np
class NaiveBayes:
def fit(self,X,y):
n_samples, n_features = X.shape
self.classes = np.unique(y)
n_classes = len(self.classes)
#init mean , var, priors
self._mean = np.zeros((n_classes,n_features), dtype=np.float64)
self._var = np.zeros((n_classes,n_features), dtype=np.float64)
self._priors = np.zeros(n_classes, dtype=np.float64)
for c in self.classes:
X_c = X[y==c]
self._mean[c] = X_c.mean(axis=0)
self._var[c] = X_c.var(axis=0)
self._priors[c] = X_c.shape[0] / n_samples
#trying to use list comprehenstion to remove the loop
# self._mean = [X[y==c].mean(axis=0) for c in self.classes]
# self._var= [X[y==c].var(axis=0) for c in self.classes]
# self._priors= [X[y==c].shape[0] / n_samples for c in self.classes]
# Trying to have only one command for all three
print([c for c in self.classes]) #debugging
# [(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]
# (self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes )
#debugging
print(self._mean)
print(self._var)
print(self._priors)
def predict(self,X):
y_pred = [self._predict(x) for x in X]
return np.array(y_pred)
def _predict(self,x):
posteriors = [self._posterior(x,c,idx) for (idx,c) in enumerate(self.classes)]
return self.classes[np.argmax(posteriors)]
def _posterior(self,x,c,idx):
prior = np.log(self._priors[idx])
likelihood = np.prod((self._likelihood)(idx,x))
return prior np.log(likelihood)
def _likelihood(self,class_idx,x): # x is a single sample , c is the class, class_idx is the id of said class , and this returns the likelihood of the sample belonging to the class, given the mean and variance of the class, and the priors of the class , IE the probability of the sample belonging to the class as the __PDF__ of the class. It is the _pdf function from the video
mean = self._mean[class_idx]
var = self._var[class_idx]
n_features = len(x)
coeff = 1.0 / np.sqrt(2 * np.pi * var)
exp = np.exp(-(x - mean)**2 / (2 * var))
return coeff * exp
- With use of three list comprehensions (working)
#NaiveBayes.py
#
#
import numpy as np
class NaiveBayes:
def fit(self,X,y):
n_samples, n_features = X.shape
self.classes = np.unique(y)
n_classes = len(self.classes)
#init mean , var, priors
self._mean = np.zeros((n_classes,n_features), dtype=np.float64)
self._var = np.zeros((n_classes,n_features), dtype=np.float64)
self._priors = np.zeros(n_classes, dtype=np.float64)
# for c in self.classes:
# X_c = X[y==c]
# self._mean[c] = X_c.mean(axis=0)
# self._var[c] = X_c.var(axis=0)
# self._priors[c] = X_c.shape[0] / n_samples
#trying to use list comprehenstion to remove the loop
self._mean = [X[y==c].mean(axis=0) for c in self.classes]
self._var= [X[y==c].var(axis=0) for c in self.classes]
self._priors= [X[y==c].shape[0] / n_samples for c in self.classes]
# Trying to have only one command for all three
print([c for c in self.classes]) #debugging
# [(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]
# (self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes )
#debugging
print(self._mean)
print(self._var)
print(self._priors)
def predict(self,X):
y_pred = [self._predict(x) for x in X]
return np.array(y_pred)
def _predict(self,x):
posteriors = [self._posterior(x,c,idx) for (idx,c) in enumerate(self.classes)]
return self.classes[np.argmax(posteriors)]
def _posterior(self,x,c,idx):
prior = np.log(self._priors[idx])
likelihood = np.prod((self._likelihood)(idx,x))
return prior np.log(likelihood)
def _likelihood(self,class_idx,x): # x is a single sample , c is the class, class_idx is the id of said class , and this returns the likelihood of the sample belonging to the class, given the mean and variance of the class, and the priors of the class , IE the probability of the sample belonging to the class as the __PDF__ of the class. It is the _pdf function from the video
mean = self._mean[class_idx]
var = self._var[class_idx]
n_features = len(x)
coeff = 1.0 / np.sqrt(2 * np.pi * var)
exp = np.exp(-(x - mean)**2 / (2 * var))
return coeff * exp
- With use of one list comprehension, and numpy array manipulation ( not working), (error not understood)
#NaiveBayes.py
#
#
import numpy as np
class NaiveBayes:
def fit(self,X,y):
n_samples, n_features = X.shape
self.classes = np.unique(y)
n_classes = len(self.classes)
#init mean , var, priors
self._mean = np.zeros((n_classes,n_features), dtype=np.float64)
self._var = np.zeros((n_classes,n_features), dtype=np.float64)
self._priors = np.zeros(n_classes, dtype=np.float64)
# for c in self.classes:
# X_c = X[y==c]
# self._mean[c] = X_c.mean(axis=0)
# self._var[c] = X_c.var(axis=0)
# self._priors[c] = X_c.shape[0] / n_samples
#trying to use list comprehenstion to remove the loop
# self._mean = [X[y==c].mean(axis=0) for c in self.classes]
# self._var= [X[y==c].var(axis=0) for c in self.classes]
# self._priors= [X[y==c].shape[0] / n_samples for c in self.classes]
# Trying to have only one command for all three
print([c for c in self.classes]) #debugging
print(np.array([ [ np.array([X[y==c].mean(axis=0)]).flatten() , np.array([X[y==c].var(axis=0)]).flatten(), np.array ([X[y==c].shape[0] / n_samples ]).flatten() ] for c in self.classes],dtype=object).flatten() )#debugging
TempArray = np.array([ [ np.array([X[y==c].mean(axis=0)]).flatten() , np.array([X[y==c].var(axis=0)]).flatten(), np.array ([X[y==c].shape[0] / n_samples ]).flatten() ] for c in self.classes]).flatten()
self._mean=TempArray[0]
self._var=TempArray[1]
self._priors = TempArray[2]
# [(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]
# (self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes )
#debugging
print(self._mean)
print(self._var)
print(self._priors)
def predict(self,X):
y_pred = [self._predict(x) for x in X]
return np.array(y_pred)
def _predict(self,x):
posteriors = [self._posterior(x,c,idx) for (idx,c) in enumerate(self.classes)]
return self.classes[np.argmax(posteriors)]
def _posterior(self,x,c,idx):
prior = np.log(self._priors[idx])
likelihood = np.prod((self._likelihood)(idx,x))
return prior np.log(likelihood)
def _likelihood(self,class_idx,x): # x is a single sample , c is the class, class_idx is the id of said class , and this returns the likelihood of the sample belonging to the class, given the mean and variance of the class, and the priors of the class , IE the probability of the sample belonging to the class as the __PDF__ of the class. It is the _pdf function from the video
mean = self._mean[class_idx]
var = self._var[class_idx]
n_features = len(x)
coeff = 1.0 / np.sqrt(2 * np.pi * var)
exp = np.exp(-(x - mean)**2 / (2 * var))
return coeff * exp
and the one that I have an error in :
- With use of one list comprehension and Iterable unpacking ( Error :
ValueError: too many values to unpack (expected 1)
line 33
#NaiveBayes.py
#
#
import numpy as np
class NaiveBayes:
def fit(self,X,y):
n_samples, n_features = X.shape
self.classes = np.unique(y)
n_classes = len(self.classes)
#init mean , var, priors
self._mean = np.zeros((n_classes,n_features), dtype=np.float64)
self._var = np.zeros((n_classes,n_features), dtype=np.float64)
self._priors = np.zeros(n_classes, dtype=np.float64)
# for c in self.classes:
# X_c = X[y==c]
# self._mean[c] = X_c.mean(axis=0)
# self._var[c] = X_c.var(axis=0)
# self._priors[c] = X_c.shape[0] / n_samples
#trying to use list comprehenstion to remove the loop
# self._mean = [X[y==c].mean(axis=0) for c in self.classes]
# self._var= [X[y==c].var(axis=0) for c in self.classes]
# self._priors= [X[y==c].shape[0] / n_samples for c in self.classes]
# Trying to have only one command for all three
print([c for c in self.classes]) #debugging
# print(np.array([ [ np.array([X[y==c].mean(axis=0)]).flatten() , np.array([X[y==c].var(axis=0)]).flatten(), np.array ([X[y==c].shape[0] / n_samples ]).flatten() ] for c in self.classes],dtype=object).flatten() )#debugging
# TempArray = np.array([ [ np.array([X[y==c].mean(axis=0)]).flatten() , np.array([X[y==c].var(axis=0)]).flatten(), np.array ([X[y==c].shape[0] / n_samples ]).flatten() ] for c in self.classes]).flatten()
# self._mean=TempArray[0]
# self._var=TempArray[1]
# self._priors = TempArray[2]
[(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]
# (self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes )
#debugging
print(self._mean)
print(self._var)
print(self._priors)
def predict(self,X):
y_pred = [self._predict(x) for x in X]
return np.array(y_pred)
def _predict(self,x):
posteriors = [self._posterior(x,c,idx) for (idx,c) in enumerate(self.classes)]
return self.classes[np.argmax(posteriors)]
def _posterior(self,x,c,idx):
prior = np.log(self._priors[idx])
likelihood = np.prod((self._likelihood)(idx,x))
return prior np.log(likelihood)
def _likelihood(self,class_idx,x): # x is a single sample , c is the class, class_idx is the id of said class , and this returns the likelihood of the sample belonging to the class, given the mean and variance of the class, and the priors of the class , IE the probability of the sample belonging to the class as the __PDF__ of the class. It is the _pdf function from the video
mean = self._mean[class_idx]
var = self._var[class_idx]
n_features = len(x)
coeff = 1.0 / np.sqrt(2 * np.pi * var)
exp = np.exp(-(x - mean)**2 / (2 * var))
return coeff * exp
- Other Failed attempts
(self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes )
was a failed attempt
Can You explain the correct way to do this, and why these other approaches of mine are largely failing?
Thank you for your time.
CodePudding user response:
It looks like you're trying to compute the mean and variance of each column in X
. You can accomplish this without loops like so:
mean = np.mean(X, axis=0)
var = np.var(X, axis=0)
In general, you rarely (almost never) need loops with NumPy. The axis
argument tells NumPy to compute along rows or columns (as opposed to computing the statistics for the whole array).
(By the way, the columns of X
are usually referred to as 'features'. Most people use the word 'classes' for the unique values of y
, which are 0 and 1 in this example.)
CodePudding user response:
Look at what a list comprehension that does 2 things in the body produces:
In [122]: alist = [(i,i*2) for i in range(3)]
In [123]: alist
Out[123]: [(0, 0), (1, 2), (2, 4)]
That's one list with 3 items. I cannot unpack that into two lists.
List comprehension is streamlined way of writing a loop with an append:
In [125]: alist = []
...: for i in range(3):
...: alist.append((i,2*i))
...: alist
Out[125]: [(0, 0), (1, 2), (2, 4)]
The loop you are trying to rewrite does several things in the body:
for c in self.classes:
X_c = X[y==c]
self._mean[c] = X_c.mean(axis=0)
self._var[c] = X_c.var(axis=0)
self._priors[c] = X_c.shape[0] / n_samples
You manange to rewrite it as 3 list comprehensions, but that doesn't save time - that's 3 iterations instead of one. And as the above example shows, you can't unpack a single comprehension into 3.
Well there is a way - apply a list version of transpose to the list:
In [126]: list(zip(*alist))
Out[126]: [(0, 1, 2), (0, 2, 4)]
Using whole-array computations as suggested the other answer is better, but I thought you needed a basic look at list comprehensions as well.
Unpacking can make nice compact code, but it is quite unforgiving when it comes to matching values
The 2 element list in 126 can be unpacked to 2 variables:
In [127]: a,b = Out[126]
In [128]: [a,b] = Out[126] # or (a,b)= all the same thing
In [129]: a
Out[129]: (0, 1, 2)
but you add a layer of []:
In [130]: [[a,b]] = Out[126]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [130], in <cell line: 1>()
----> 1 [[a,b]] = Out[126]
ValueError: too many values to unpack (expected 1)
This unpacking only works for:
In [133]: [[a,b]] = Out[123][1:2]
In [134]: Out[123][1:2]
Out[134]: [(1, 2)]
Note the same layers of nesting on both sides of the assignment. That's important when unpacking.