Home > OS >  Dataframe dimension check created by pandas and how to remove the column created by group_by functio
Dataframe dimension check created by pandas and how to remove the column created by group_by functio

Time:04-15

  1. I don't know why the data frame shows that it is 1ROW*30Columes but if I use np.shape to check the data frame, the shape returns (2,31). Can someone help me out?
  2. How can I remove the first column in the data frame (1 in the data frame instead of mean radius)
import sklearn.datasets
bc = sklearn.datasets.load_breast_cancer(return_X_y = False, as_frame = True)
sk_df = pd.DataFrame(bc.data)   
print(bc.target)            
bc_df=sk_df.assign(CLASS=bc.target)
bc_df
bcmeans_df=bc_df.groupby("CLASS",as_index=False).mean().diff()
bcmeans_df.dropna().drop(bcmeans_df.columns[[0]], axis=1)
np.ndim(bcmeans_df)

CodePudding user response:

After running your code, bcmeans_df.shape is (2,31) because the changes in the following line are lost (not assigned to a variable):

bcmeans_df.dropna().drop(bcmeans_df.columns[[0]], axis=1)

Is this what you meant?

bcmeans_df = bcmeans_df.dropna().drop(bcmeans_df.columns[[0]], axis=1)

This gives bcmeans_df.shape == (1, 30)

Here's another way to get the differences in the means:

bc = sklearn.datasets.load_breast_cancer(return_X_y=False, as_frame=True)
bc_df = pd.concat([bc.data, bc.target], axis=1)
bc_diffmeans = bc_df.groupby("target").mean().diff().iloc[1].rename('diffmeans')
assert(bc_diffmeans.shape == (30,))
  • Related