Python: How to get frequency and percentage of null and nonnull values in column-CodePudding

I have a dataset that looks like this:

col1
100
NaN
100
NaN
200
100
NaN
150

And I want my output to look something like this:

           Frequency     Percent     Cumulative Frequency     Cumulative Percent
Non-Null   5             62.5        5                        62.5
Null       3             37.5        8                        100

I want to break my data down by null and non-null values and output the frequency, percent, cumulative frequency, and cumulative percent in one table.

CodePudding user response：

You can use:

out = (df['col1']
 .isna().value_counts() # count null/not-null
 .reindex([False, True], fill_value=0) # ensure both
 .set_axis(['Non-Null', 'Null']) # rename index
 .to_frame('Frequency') # series to frame
 # calculate percent from frequency
 .assign(Percent=lambda d: d['Frequency'].div(d['Frequency'].sum()).mul(100))
 # calculate cumulated data on all columns and join
 .pipe(lambda d: d.join(d.cumsum().add_prefix('Cumulative ')))
 )

print(out)

Output:

          Frequency  Percent  Cumulative Frequency  Cumulative Percent
Non-Null          5     62.5                     5                62.5
Null              3     37.5                     8               100.0

CodePudding user response：

dataframe.isnull().sum()

make it for the column needed you will get the count of null values and then you can make it in the tabular structure seperately