Home > Software engineering >  How to bulild a null notnull matrix in pandas dataframe
How to bulild a null notnull matrix in pandas dataframe

Time:04-08

Here's my dataset

Id    Column_A    Column_B    Column_C
1         Null           7        Null
2            8           7        Null
3         Null           8           7
4            8        Null           8

Here's my expected output

             Column_A    Column_B     Column_C    Total
Null              2            1            2         5
Notnull           2            3            2         7

CodePudding user response:

Assuming Null is NaN, here's one option. Using isna sum to count the NaNs, then find the difference between df length and number of NaNs for Notnulls. Then construct a DataFrame.

nulls = df.drop(columns='Id').isna().sum()
notnulls = nulls.rsub(len(df))
out = pd.DataFrame.from_dict({'Null':nulls, 'Notnull':notnulls}, orient='index')
out['Total'] = out.sum(axis=1)

If you're into one liners, we could also do:

out = (df.drop(columns='Id').isna().sum().to_frame(name='Nulls')
       .assign(Notnull=df.drop(columns='Id').notna().sum()).T
       .assign(Total=lambda x: x.sum(axis=1)))

Output:

         Column_A  Column_B  Column_C  Total
Nulls           2         1         2      5
Notnull         2         3         2      7

CodePudding user response:

Use Series.value_counts for non missing values:

df = (df.replace('Null', np.nan)
        .set_index('Id', 1)
        .notna()
        .apply(pd.value_counts)
        .rename({True:'Notnull', False:'Null'}))
df['Total'] = df.sum(axis=1)

print (df)
         Column_A  Column_B  Column_C  Total
Null            2         1         2      5
Notnull         2         3         2      7
  • Related