- I don't know why the data frame shows that it is 1ROW*30Columes but if I use np.shape to check the data frame, the shape returns (2,31). Can someone help me out?
- How can I remove the first column in the data frame (1 in the data frame instead of mean radius)
import sklearn.datasets
bc = sklearn.datasets.load_breast_cancer(return_X_y = False, as_frame = True)
sk_df = pd.DataFrame(bc.data)
print(bc.target)
bc_df=sk_df.assign(CLASS=bc.target)
bc_df
bcmeans_df=bc_df.groupby("CLASS",as_index=False).mean().diff()
bcmeans_df.dropna().drop(bcmeans_df.columns[[0]], axis=1)
np.ndim(bcmeans_df)
CodePudding user response:
After running your code, bcmeans_df.shape
is (2,31)
because the changes in the following line are lost (not assigned to a variable):
bcmeans_df.dropna().drop(bcmeans_df.columns[[0]], axis=1)
Is this what you meant?
bcmeans_df = bcmeans_df.dropna().drop(bcmeans_df.columns[[0]], axis=1)
This gives bcmeans_df.shape == (1, 30)
Here's another way to get the differences in the means:
bc = sklearn.datasets.load_breast_cancer(return_X_y=False, as_frame=True)
bc_df = pd.concat([bc.data, bc.target], axis=1)
bc_diffmeans = bc_df.groupby("target").mean().diff().iloc[1].rename('diffmeans')
assert(bc_diffmeans.shape == (30,))