Home > Back-end >  How to drop same rows in 3 pandas objects?
How to drop same rows in 3 pandas objects?

Time:03-23

I need to drop same rows in 2 pandas dataframes (df_train and df_train) and 1 pandas series (y_train). All of them have the same number of rows.

This is my current code:

indices = df_train[(df_train.col1!=0) | (df_train.col2!=10)].index

df_train = df_train.drop(indices)
X_train = X_train.drop(indices)
y_train = y_train.drop(indices)

It works well for df_train and X_train, but fails for y_train saying that indices cannot be found in y_train (KeyError).

CodePudding user response:

There are different indices, but because same number of rows is possible change logic - get rows by conditions with inverting to numpy array for X_train, y_train:

m = (df_train.col1==0) & (df_train.col2==10)

df_train = df_train[m]
X_train = X_train[m.to_numpy()]
y_train = y_train[m.to_numpy()]

Or invert mask:

m = (df_train.col1!=0) | (df_train.col2!=10)

df_train = df_train[~m]
X_train = X_train[~m.to_numpy()]
y_train = y_train[~m.to_numpy()]

Your solution working if same indices in all 3 pandas objects:

df_train = df_train.reset_index(drop=True)
X_train = X_train.reset_index(drop=True)
y_train = y_train.reset_index(drop=True)

CodePudding user response:

Another option is to define a boolean mask and apply it:

# note the mask is inverted (spotted this after checking @jezrael's answer)
mask = ~((df_train.col1!=0) | (df_train.col2!=10))


df_train = df_train.loc[mask]
X_train = X_train.loc[mask]
y_train = y_train.loc[mask]
  • Related