I have a dataframe that looks like this one:
import numpy as np
import pandas as pd
column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight',
'shucked_weight', 'viscera_weight', 'shell_weight',
'rings']
# Data source: https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
abalone = pd.read_csv('abalone.data', header=None, names=column_headers)
# Split the data cross validation
shuffled_index = np.random.permutation(len(scaled))
shuffled_df = scaled.reindex(shuffled_index)
shuffled_df.head()
# Split the dataset in cross fold validation
k = 4
folds = np.array_split(shuffled_df, k)
....
k = 5
scores = list()
for fold in folds:
training_set = list(folds)
training_set.remove(fold)
training_set = pd.concat(training_set)
d = fold.apply(lambda row: distance(row, training_set, k), axis=1)
error = root_mean_squared_error(fold['rings'], d)
scores.append(error)
>>
ValueError: Can only compare identically-labeled DataFrame objects
I am implementing K-Neighbors Algorithm with Pandas and Numpy and when getting a list of dataframes, I can't remove the one I am looping on with a list. How to remove the one I am looping on from the list so I can concatenate the remaining ones on cross fold validation?
CodePudding user response:
You can delete the dataframe by index from your list with del
.
Try this minimal example:
column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight',
'shucked_weight', 'viscera_weight', 'shell_weight',
'rings']
abalone = pd.read_csv('__data_input/abalone.data', header=None, names=column_headers)
...
folds = np.array_split(abalone, 4)
for idx, fold in enumerate(folds):
training_set = folds.copy()
del training_set[idx]
training_set = pd.concat(training_set)
...