I have a dataset with 250,000 samples. The column "CHANNEL" has 7 missing values. I want to delete those 7 rows. Here is my code:
mask = df_train["CHANNEL"].notnull()
df_train = df_train[mask]
I checked the shape by
df_train.shape
It correctly outputs 249993 rows. However, when I tried to output the entire dataset, it still shows index from 0 to 249999, like the below picture:
I also checked the number of missing values in each column of df_train, and each of them is zero. This problem matters because I want to do concatenation later and some issues arise. I am not sure if I missed some points when using the above commands. I would appreciate any suggestions and comments!
CodePudding user response:
Try using dropna()
df_train = df_train.dropna()
You may see that the end still has the index 249999, that's just because the original index hasn't changed. To reset the index of the new data frame without the missing values, you can use reset_index()
df_train = df_train.dropna()
df_train = df_train.reset_index(drop=True)