Is there a way to return what percent of the data satisfies a set criteria?-CodePudding

I have a dataset that looks like this:

data

Can't really wrap around how to proceed with this. Its pretty straight forward - there are 33 observations for each variable and only 3 are not null in mean_sleep_time and mean_bed_time. All I want is a piece of code that returns percentage of total values. In short a data-frame that shows "out of 33 users only 3 have input valid data"

CodePudding user response：

This is my approach :

library(dplyr) # for n_distinct

my_tot_user <- n_distinct(my_df$id)
my_df <- my_df[complete.cases(my_df),]
my_user <- n_distinct(my_df$id)
paste("out of ", my_tot_user, " users only ", my_user, " have input valid data", sep = "")

The function n_distinct allow us to get the number of unique value The function complete.cases allow us to keep every row that doesn't contain any NA

CodePudding user response：

Short & sweet. Assuming your dataframe is called df, then:

length(df[!is.na(df$mean_sleep_time&df$mean_bed_time),])/nrow(df)*100 for percentage

and

length(df[!is.na(df$mean_sleep_time&df$mean_bed_time),]) for complete cases

paste("Out of ", nrow(df), " users only ", length(df[!is.na(df$mean_sleep_time&df$mean_bed_time),]), " have valid input data", sep = "")