I am facing issues on applying k fold. Please someone help me in doing this. When I apply train_test_split it doesnot create issues but k-fold is creating trouble regarding indexes.
how to apply k fold in my dataset?
my code is like that
from sklearn.model_selection import KFold
df = pd.read_csv('CD.TXT',delimiter=',')
df.head()
X = df[['A', 'B', 'C', 'D']].values
Y=df['Label'].values
X=pd.DataFrame(X)
Y=pd.DataFrame(Y)
cv = KFold(n_splits=10, random_state=42, shuffle=False)
for train_index, test_index in cv.split(X):
print("Train Index: ", train_index, "\n")
print("Test Index: ", test_index)
X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]
print(X_train)
print(Y_train)
my dataset is like that
A,B,C,D,Label
10,20,30,40,1
20,20,15,60,0
10,20,30,40,1
10,20,30,40,1
10,20,39,40,1
10,20,30,40,1
10,20,30,40,1
10,20,32,40,1
10,20,30,40,1
10,20,30,40,1
10,20,3,40,1
20,20,15,60,0
20,20,15,60,0
20,20,12,60,0
20,20,15,60,0
20,20,15,60,0
20,20,12,60,0
20,20,15,60,0
error which I am facing
Test Index: [18]
Traceback (most recent call last):
File "<ipython-input-11-10016b897261>", line 1, in <module>
runfile('D:/experiments/untitled0.py', wdir='D:/experiments')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/experiments/untitled0.py", line 61, in <module>
X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2934, in __getitem__
raise_missing=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1354, in _convert_to_indexer
return self._get_listlike_indexer(obj, axis, **kwargs)[1]
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
raise_missing=raise_missing)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1252, in _validate_read_indexer
raise KeyError("{} not in index".format(not_found))
KeyError: '[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] not in index'
CodePudding user response:
The reason for error is you are trying to index dataframe using numpy index.
Try commenting
X=pd.DataFrame(X) Y=pd.DataFrame(Y)
from sklearn.model_selection import KFold
df = pd.read_csv('CD.TXT',delimiter=',')
df.head()
X = df[['A', 'B', 'C', 'D']].values
Y=df['Label'].values
#X=pd.DataFrame(X)
#Y=pd.DataFrame(Y)
cv = KFold(n_splits=10, random_state=42, shuffle=False)
for train_index, test_index in cv.split(X):
print("Train Index: ", train_index, "\n")
print("Test Index: ", test_index)
X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]
print(X_train)
print(Y_train)
or try using
from sklearn.model_selection import KFold
df = pd.read_csv('CD.TXT',delimiter=',')
df.head()
X = df[['A', 'B', 'C', 'D']].values
Y=df['Label'].values
X=pd.DataFrame(X)
Y=pd.DataFrame(Y)
cv = KFold(n_splits=10, random_state=42, shuffle=False)
for train_index, test_index in cv.split(X):
print("Train Index: ", train_index, "\n")
print("Test Index: ", test_index)
X_train, X_test, Y_train, Y_test = X.iloc[train_index,:], X.iloc[test_index,:], Y.iloc[train_index], Y.iloc[test_index]
print(X_train)
print(Y_train)