I keep getting a ValueError in this implementation. Please help pinpoint the issue-CodePudding

Code:

import numpy as np
import pandas as pd
from math import exp
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

data = pd.read_csv("haberman.csv")

df = pd.DataFrame(data)
value = [1] * 170
#inserting a bias 
df = df.insert(0, "Atr0", value, False)

x = data.iloc[: , :-1]
y = data.iloc[: , -1]
x = np.array(x)
y = np.array(y)

Error:

ValueError: Length of values does not match length of index

Haberman.csv is 4 columns regarding Habermans Cancer Survival dataset

CodePudding user response：

This error occurs at line df = df.insert(0, "Atr0", value, False). As the prompt indicates, could you please check whether len(value) == len(df.index) holds?

Another issue, DataFrame.insert(...) updates DataFrame in-place and returns None, so do not use the return value.

CodePudding user response：

The length of value is 170, while the df has 305 rows. I'm not sure why you want to create 170 rows of the value 1, but if you change value = [1] * 170 to value = [1] * 305, the error goes away.

However, with df = df.insert(0, "Atr0", value, False), you end up with an empty dataframe. You need to fix that by changing it to just df.insert(0, "Atr0", value, False). The insert method changes the dataframe in place so you don't set the dataframe equal to itself. Hope that makes sense.