I have a dataset that looks like this:
col1
100
NaN
100
NaN
200
100
NaN
150
And I want my output to look something like this:
Frequency Percent Cumulative Frequency Cumulative Percent
Non-Null 5 62.5 5 62.5
Null 3 37.5 8 100
I want to break my data down by null and non-null values and output the frequency, percent, cumulative frequency, and cumulative percent in one table.
CodePudding user response:
You can use:
out = (df['col1']
.isna().value_counts() # count null/not-null
.reindex([False, True], fill_value=0) # ensure both
.set_axis(['Non-Null', 'Null']) # rename index
.to_frame('Frequency') # series to frame
# calculate percent from frequency
.assign(Percent=lambda d: d['Frequency'].div(d['Frequency'].sum()).mul(100))
# calculate cumulated data on all columns and join
.pipe(lambda d: d.join(d.cumsum().add_prefix('Cumulative ')))
)
print(out)
Output:
Frequency Percent Cumulative Frequency Cumulative Percent
Non-Null 5 62.5 5 62.5
Null 3 37.5 8 100.0
CodePudding user response:
dataframe.isnull().sum()
make it for the column needed you will get the count of null values and then you can make it in the tabular structure seperately