So I have a pandas dataframe that has 2 columns:
import numpy as np
data = {'Column_A': [1,2,3,4,5,np.nan,6,7,np.nan,np.nan,8,9,10,np.nan], 'Column_B': [1,2,3,4,5,np.nan,6,7,np.nan,np.nan,8,9,10,11]}
df = pd.DataFrame(data)
I wanted to count the number of null and non-null values for each column so I used df.isnull().sum() and df.notnull().sum() and that gives me a list of all my columns with the count of null and not null for each column:
df.isnull().sum()
Column_A 4
Column_B 3
df.notnull().sum()
Column_A 10
Column_B 11
What I want to do is add the count of null and not null for each column to the dataframe and group by all the columns with the original columns called Column Names. So for example the dataframe would look like this:
Column_Names Count_Null Count_NotNull
Column_A 4 10
Column_B 3 11
And so on.
I can run df.isnull().sum() and df.notnull().sum() separately to get the correct counts for all the columns.
But how can I create a table that has the original columns on rows, with the two metrics as additional columns like in the example above?
CodePudding user response:
I'd do it like this:
names = {False: 'Count_NotNull', True: 'Count_Null'}
result = df.isna().apply(pd.Series.value_counts, dropna=False).rename(names).T
edit - or without apply
:
result = df.isna().sum().rename('Count_Null').to_frame()
result['Count_NotNull'] = len(df) - result['Count_Null']