Home > Blockchain >  One hot vector in pandas to encode missing values
One hot vector in pandas to encode missing values

Time:06-12

I am working with a large pandas dataframe and a few columns have lots of missing data. I am not totally confident with my imputation and I believe the presence or absence of data for these variables could be useful information, so I would like to add another column of the dataframe with 0 where the entry is missing and 1 otherwise. Is there a quick/efficient way to do this in pandas?

CodePudding user response:

Try out the following:

df['New_Col'] = df['Col'].notna().astype('uint8')

Where Col it your column containing np.nan values and New_Col your binary target column indicating whether Col contains np.nan.

CodePudding user response:

The relevant function here is .notna, which will yield bool depending on whether the value is missing or not. To apply it to multiple columns of interest, use:

for c in cols_of_interest:
    df[f'{c}_not_missing'] = 1 * df[c].notna()

Note that 1 * bool will give integer 0/1.

  • Related