I need to drop same rows in 2 pandas dataframes (df_train
and df_train
) and 1 pandas series (y_train
). All of them have the same number of rows.
This is my current code:
indices = df_train[(df_train.col1!=0) | (df_train.col2!=10)].index
df_train = df_train.drop(indices)
X_train = X_train.drop(indices)
y_train = y_train.drop(indices)
It works well for df_train
and X_train
, but fails for y_train
saying that indices cannot be found in y_train
(KeyError
).
CodePudding user response:
There are different indices, but because same number of rows is possible change logic - get rows by conditions with inverting to numpy array for X_train, y_train
:
m = (df_train.col1==0) & (df_train.col2==10)
df_train = df_train[m]
X_train = X_train[m.to_numpy()]
y_train = y_train[m.to_numpy()]
Or invert mask:
m = (df_train.col1!=0) | (df_train.col2!=10)
df_train = df_train[~m]
X_train = X_train[~m.to_numpy()]
y_train = y_train[~m.to_numpy()]
Your solution working if same indices in all 3 pandas objects:
df_train = df_train.reset_index(drop=True)
X_train = X_train.reset_index(drop=True)
y_train = y_train.reset_index(drop=True)
CodePudding user response:
Another option is to define a boolean mask and apply it:
# note the mask is inverted (spotted this after checking @jezrael's answer)
mask = ~((df_train.col1!=0) | (df_train.col2!=10))
df_train = df_train.loc[mask]
X_train = X_train.loc[mask]
y_train = y_train.loc[mask]