Code:
import numpy as np
import pandas as pd
from math import exp
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
data = pd.read_csv("haberman.csv")
df = pd.DataFrame(data)
value = [1] * 170
#inserting a bias
df = df.insert(0, "Atr0", value, False)
x = data.iloc[: , :-1]
y = data.iloc[: , -1]
x = np.array(x)
y = np.array(y)
Error:
ValueError: Length of values does not match length of index
Haberman.csv is 4 columns regarding Habermans Cancer Survival dataset
CodePudding user response:
This error occurs at line df = df.insert(0, "Atr0", value, False)
. As the prompt indicates, could you please check whether len(value) == len(df.index)
holds?
Another issue, DataFrame.insert(...)
updates DataFrame
in-place and returns None
, so do not use the return value.
CodePudding user response:
The length of value is 170, while the df has 305 rows. I'm not sure why you want to create 170 rows of the value 1, but if you change value = [1] * 170
to value = [1] * 305
, the error goes away.
However, with df = df.insert(0, "Atr0", value, False)
, you end up with an empty dataframe. You need to fix that by changing it to just df.insert(0, "Atr0", value, False)
. The insert method changes the dataframe in place so you don't set the dataframe equal to itself. Hope that makes sense.