Home > database >  Remove DataFrames from a list of DataFrames
Remove DataFrames from a list of DataFrames

Time:08-24

I have a dataframe that looks like this one:

DataFrame abalone.csv

import numpy as np
import pandas as pd

column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight', 
                  'shucked_weight', 'viscera_weight', 'shell_weight', 
                  'rings']

# Data source: https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
    
abalone = pd.read_csv('abalone.data', header=None, names=column_headers)

# Split the data cross validation

shuffled_index = np.random.permutation(len(scaled))

shuffled_df = scaled.reindex(shuffled_index)

shuffled_df.head()

# Split the dataset in cross fold validation

k = 4
folds = np.array_split(shuffled_df, k)

....

k = 5
scores = list()
for fold in folds:
    training_set = list(folds)
    training_set.remove(fold)
    training_set = pd.concat(training_set)
    d = fold.apply(lambda row: distance(row, training_set, k), axis=1)
    error = root_mean_squared_error(fold['rings'], d)
    scores.append(error)
   
>>

ValueError: Can only compare identically-labeled DataFrame objects


I am implementing K-Neighbors Algorithm with Pandas and Numpy and when getting a list of dataframes, I can't remove the one I am looping on with a list. How to remove the one I am looping on from the list so I can concatenate the remaining ones on cross fold validation?

CodePudding user response:

You can delete the dataframe by index from your list with del.

Try this minimal example:

column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight', 
                  'shucked_weight', 'viscera_weight', 'shell_weight', 
                  'rings']

 
abalone = pd.read_csv('__data_input/abalone.data', header=None, names=column_headers)

...

folds = np.array_split(abalone, 4)

for idx, fold in enumerate(folds):
    training_set = folds.copy()
    del training_set[idx]
    training_set = pd.concat(training_set)
    ...
  • Related