I'm trying to remove the nans and blank spaces from two columns and replace them with mean values from the respective columns using columns.fillna(column.mean), but it tells me that "columns is not defined" when I implement the following code.
How do I define the columns I've defined as a parameter in my data frame so that the columns.fillna(column.mean) methods apply?
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
points = data = pd.read_csv (r'brain_diseases.csv', index_col='id')
df = pd.DataFrame(data, columns= ['cancer','prions'])
columns.fillna(cancer.mean())
columns.fillna(pryons.mean())
kpoints = KMeans(n_clusters=3, init='random').fit(data)
center = kpoints.cluster_centers_
print(center)
plt.scatter(data['trestbps'], data['chol'], c=kpoints.labels_.astype(float), s=50, alpha=0.5)
plt.scatter(center[:, 0], center[:, 1], c='black', s=50)
plt.show()
Any help greatly appreciated.
CodePudding user response:
columns
is not defined in your code,
the fillna function can be called on on the dataframe:
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
points = data = pd.read_csv (r'brain_diseases.csv', index_col='id')
df = pd.DataFrame(data, columns= ['cancer','prions'])
df.fillna(cancer.mean())
df.fillna(pryons.mean()) # fill on df instead
...