I am working with a large pandas dataframe and a few columns have lots of missing data. I am not totally confident with my imputation and I believe the presence or absence of data for these variables could be useful information, so I would like to add another column of the dataframe with 0 where the entry is missing and 1 otherwise. Is there a quick/efficient way to do this in pandas?
CodePudding user response:
Try out the following:
df['New_Col'] = df['Col'].notna().astype('uint8')
Where Col
it your column containing np.nan
values and New_Col
your binary target column indicating whether Col
contains np.nan
.
CodePudding user response:
The relevant function here is .notna
, which will yield bool
depending on whether the value is missing or not. To apply it to multiple columns of interest, use:
for c in cols_of_interest:
df[f'{c}_not_missing'] = 1 * df[c].notna()
Note that 1 * bool
will give integer 0/1.