I'm trying to train an ANN but get
IndexError: Target 5 is out of bounds
I believe the problem is in this part of my code where I split the data:
from sklearn.datasets import fetch_california_housing
california = fetch_california_housing()
data = pd.DataFrame(california.data)
data.columns = california.feature_names
data['Price'] = california.target
X = data.iloc[:, 0:8]
y = data.iloc[:, 8]
Is there something I'm doing wrong here?
CodePudding user response:
Adding new columns then selecting by column index is error-prone.
In scikit-learn>=0.23.0
, fetch_california_housing
can already return a dataframe with the as_frame
parameter.
If you need dataframes, your code should be structured like this:
from sklearn.datasets import fetch_california_housing
california = fetch_california_housing(as_frame=True)
X = california.data
y = california.target
CodePudding user response:
I hope this might help:
from sklearn.datasets import fetch_california_housing
# dataframe
df = fetch_california_housing(as_frame=True).frame
dataset = fetch_california_housing()
X, y = dataset.data, dataset.target
features = dataset.feature_names
features
['MedInc',
'HouseAge',
'AveRooms',
'AveBedrms',
'Population',
'AveOccup',
'Latitude',
'Longitude']