Home > other >  Get the percentage of empty cells in a column for a certain group
Get the percentage of empty cells in a column for a certain group

Time:10-30

I have a dataframe like this:

df <- data.frame(name = c("Jon", "Bill", "Maria", "Ben", "Emma", "Jon", "Bill", "Maria", "Ben", "Emma", "Jon", "Bill", "Maria", "Ben", "Emma"), data = c(1, "", 3, 1, "",3, 4, "", 1, "", 1, 3, 3, 1, 3)
                 )

I would like to percentage of the empty cells for each person and create another dataframe. The results should looks like this:

df_result<-data.frame(name=c("Ben", "Bill", "Emma", "Jon", "Maria"), percentage=c(0, 0.33, 0.66, 0, 0.33))

I have tried the group_by function from dpylr package.

df_result<-df%>%group_by(name)%>%summarise(miss_count=count(data))
df_result$percentage<-1-(df_result$miss_count/3)

I have searched the answer for a long time. If it is duplicated, I am sincerely sorry.

Thanks in advance!

CodePudding user response:

We can use mean on a logical vector to get the percentage (* 100)

library(dplyr)
df %>% 
   group_by(name) %>%
   summarise(miss_count = mean(data == ''))

-output

# A tibble: 5 × 2
  name  miss_count
  <chr>      <dbl>
1 Ben        0    
2 Bill       0.333
3 Emma       0.667
4 Jon        0    
5 Maria      0.333

CodePudding user response:

In base R, you can use aggregate as -

aggregate(data~name, df, function(x) sum(x == '')/length(x))

#   name      data
#1   Ben 0.0000000
#2  Bill 0.3333333
#3  Emma 0.6666667
#4   Jon 0.0000000
#5 Maria 0.3333333
  • Related