I have a pandas dataframe which has some null values and want to add a new column model_prediction
which is model's predictions on the data.
The model I have does not take null values and I want the model_prediction
value to be NaN for those rows. The problem is the dataframe is very large and using df.iterrows is a very slow process and want to avoid it.
CodePudding user response:
Assuming your dataframe is df
and model is model
, please try this:
import numpy as np
df = df.reset_index(drop=True)
df_na = df[df.isna().any(axis=1)]
df_na.loc[:,'model_prediction'] = np.nan
df_model = df.dropna()
df_model.loc[:,'model_prediction'] = model.predict(df_model.values)
df = df_model.append(df_na).sort_index()