I have a dataset that looks like this:
Can't really wrap around how to proceed with this. Its pretty straight forward - there are 33 observations for each variable and only 3 are not null in mean_sleep_time and mean_bed_time. All I want is a piece of code that returns percentage of total values. In short a data-frame that shows "out of 33 users only 3 have input valid data"
CodePudding user response:
This is my approach :
library(dplyr) # for n_distinct
my_tot_user <- n_distinct(my_df$id)
my_df <- my_df[complete.cases(my_df),]
my_user <- n_distinct(my_df$id)
paste("out of ", my_tot_user, " users only ", my_user, " have input valid data", sep = "")
The function n_distinct allow us to get the number of unique value The function complete.cases allow us to keep every row that doesn't contain any NA
CodePudding user response:
Short & sweet. Assuming your dataframe is called df
, then:
length(df[!is.na(df$mean_sleep_time&df$mean_bed_time),])/nrow(df)*100
for percentage
and
length(df[!is.na(df$mean_sleep_time&df$mean_bed_time),])
for complete cases
paste("Out of ", nrow(df), " users only ", length(df[!is.na(df$mean_sleep_time&df$mean_bed_time),]), " have valid input data", sep = "")