'numpy.ndarray' object has no attribute 'columns'-CodePudding

I was following the machine learning tutorial on youtube and using this dataset. However while the person in the video had no problem runnning the code, I received an error that the numpy.ndarray object has no attribute 'columns'

below is the code I ran

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import RandomOverSampler

cols = ['integrated_mean','integrated_standard_deviation','integrated_excess_kurtosis','integrated_skewness','DM_mean','DM_standard_deviation','DM_excess_kurtosis','DM_skewness','class']
df = pd.read_csv("HTRU_2.data", names = cols)

train, valid, test = np.split(df.sample(frac = 1), [int(0.6*len(df)), int(0.8*len(df))])

def scale_dataset(dataframe, oversample = False):
    X = dataframe[dataframe.columns[:-1]].values
    y = dataframe[dataframe.columns[-1]].values

    scaler = StandardScaler()
    X = scaler.fit_transform(X)

    if oversample:
        ros = RandomOverSampler()
        X, y = ros.fit_resample(X, y)

    data = np.hstack((X, np.reshape(y, (-1, 1))))

    return data, X, y

train, X_train, y_train = scale_dataset(train, oversample = True)
valid, X_train, y_train = scale_dataset(train, oversample = False)
test, X_train, y_train = scale_dataset(train, oversample = False)

I do not know what is happening and how to fix it, I've tried searching elsewhere but I have no idea. If anyone can help it would be much appreciated.

CodePudding user response：

I couldn't find the minute in the tutorial, but may be it's just a consequence of copy-paste.

In the function scale_dataset you make data a numpy array and then you assign that value to train variable. But when you come again with scale_dataset for valid data set you want to use this `train' data set as a pandas dataframe but in that moment it's a numpy array.

My common sense tells me you want to use valid data set instead of train and so on like this:

train, X_train, y_train = scale_dataset(train, oversample = True)
valid, X_train, y_train = scale_dataset(valid, oversample = False)
test, X_train, y_train = scale_dataset(test, oversample = False)

CodePudding user response：

Instead of

X = dataframe[dataframe.columns[:-1]].values
y = dataframe[dataframe.columns[-1]].values

I did

X = dataframe[:, :-1]
y = dataframe[:, -1]

And now all the codes work fine now