Home > Software engineering >  K fold cross validation---KeyError: '[] not in index'
K fold cross validation---KeyError: '[] not in index'

Time:10-12

I am facing issues on applying k fold. Please someone help me in doing this. When I apply train_test_split it doesnot create issues but k-fold is creating trouble regarding indexes.

how to apply k fold in my dataset?

my code is like that

from sklearn.model_selection import KFold
df = pd.read_csv('CD.TXT',delimiter=',')
df.head() 
X = df[['A', 'B', 'C', 'D']].values
Y=df['Label'].values
X=pd.DataFrame(X)
Y=pd.DataFrame(Y)
cv = KFold(n_splits=10, random_state=42, shuffle=False)
for train_index, test_index in cv.split(X):
    print("Train Index: ", train_index, "\n")
    print("Test Index: ", test_index)
X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]
print(X_train)
print(Y_train)

my dataset is like that

A,B,C,D,Label
10,20,30,40,1
20,20,15,60,0
10,20,30,40,1
10,20,30,40,1
10,20,39,40,1
10,20,30,40,1
10,20,30,40,1
10,20,32,40,1
10,20,30,40,1
10,20,30,40,1
10,20,3,40,1
20,20,15,60,0
20,20,15,60,0
20,20,12,60,0
20,20,15,60,0
20,20,15,60,0
20,20,12,60,0
20,20,15,60,0

error which I am facing

Test Index:  [18]
Traceback (most recent call last):

  File "<ipython-input-11-10016b897261>", line 1, in <module>
    runfile('D:/experiments/untitled0.py', wdir='D:/experiments')

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/experiments/untitled0.py", line 61, in <module>
    X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2934, in __getitem__
    raise_missing=True)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1354, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1252, in _validate_read_indexer
    raise KeyError("{} not in index".format(not_found))

KeyError: '[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] not in index'

CodePudding user response:

The reason for error is you are trying to index dataframe using numpy index.

Try commenting X=pd.DataFrame(X) Y=pd.DataFrame(Y)

from sklearn.model_selection import KFold
df = pd.read_csv('CD.TXT',delimiter=',')
df.head() 
X = df[['A', 'B', 'C', 'D']].values
Y=df['Label'].values
#X=pd.DataFrame(X)
#Y=pd.DataFrame(Y)
cv = KFold(n_splits=10, random_state=42, shuffle=False)
for train_index, test_index in cv.split(X):
    print("Train Index: ", train_index, "\n")
    print("Test Index: ", test_index)
X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]
print(X_train)
print(Y_train)

or try using

from sklearn.model_selection import KFold
df = pd.read_csv('CD.TXT',delimiter=',')
df.head() 
X = df[['A', 'B', 'C', 'D']].values
Y=df['Label'].values
X=pd.DataFrame(X)
Y=pd.DataFrame(Y)
cv = KFold(n_splits=10, random_state=42, shuffle=False)
for train_index, test_index in cv.split(X):
    print("Train Index: ", train_index, "\n")
    print("Test Index: ", test_index)
X_train, X_test, Y_train, Y_test = X.iloc[train_index,:], X.iloc[test_index,:], Y.iloc[train_index], Y.iloc[test_index]
print(X_train)
print(Y_train)
  • Related