How to bulild a null notnull matrix in pandas dataframe-CodePudding

Here's my dataset

Id    Column_A    Column_B    Column_C
1         Null           7        Null
2            8           7        Null
3         Null           8           7
4            8        Null           8

Here's my expected output

             Column_A    Column_B     Column_C    Total
Null              2            1            2         5
Notnull           2            3            2         7

CodePudding user response：

Assuming Null is NaN, here's one option. Using isna sum to count the NaNs, then find the difference between df length and number of NaNs for Notnulls. Then construct a DataFrame.

nulls = df.drop(columns='Id').isna().sum()
notnulls = nulls.rsub(len(df))
out = pd.DataFrame.from_dict({'Null':nulls, 'Notnull':notnulls}, orient='index')
out['Total'] = out.sum(axis=1)

If you're into one liners, we could also do:

out = (df.drop(columns='Id').isna().sum().to_frame(name='Nulls')
       .assign(Notnull=df.drop(columns='Id').notna().sum()).T
       .assign(Total=lambda x: x.sum(axis=1)))

Output:

         Column_A  Column_B  Column_C  Total
Nulls           2         1         2      5
Notnull         2         3         2      7

CodePudding user response：

Use Series.value_counts for non missing values:

df = (df.replace('Null', np.nan)
        .set_index('Id', 1)
        .notna()
        .apply(pd.value_counts)
        .rename({True:'Notnull', False:'Null'}))
df['Total'] = df.sum(axis=1)

print (df)
         Column_A  Column_B  Column_C  Total
Null            2         1         2      5
Notnull         2         3         2      7