I have a dataframe like this:
df <- data.frame(name = c("Jon", "Bill", "Maria", "Ben", "Emma", "Jon", "Bill", "Maria", "Ben", "Emma", "Jon", "Bill", "Maria", "Ben", "Emma"), data = c(1, "", 3, 1, "",3, 4, "", 1, "", 1, 3, 3, 1, 3)
)
I would like to percentage of the empty cells for each person and create another dataframe. The results should looks like this:
df_result<-data.frame(name=c("Ben", "Bill", "Emma", "Jon", "Maria"), percentage=c(0, 0.33, 0.66, 0, 0.33))
I have tried the group_by function from dpylr package.
df_result<-df%>%group_by(name)%>%summarise(miss_count=count(data))
df_result$percentage<-1-(df_result$miss_count/3)
I have searched the answer for a long time. If it is duplicated, I am sincerely sorry.
Thanks in advance!
CodePudding user response:
We can use mean
on a logical vector to get the percentage (* 100
)
library(dplyr)
df %>%
group_by(name) %>%
summarise(miss_count = mean(data == ''))
-output
# A tibble: 5 × 2
name miss_count
<chr> <dbl>
1 Ben 0
2 Bill 0.333
3 Emma 0.667
4 Jon 0
5 Maria 0.333
CodePudding user response:
In base R, you can use aggregate
as -
aggregate(data~name, df, function(x) sum(x == '')/length(x))
# name data
#1 Ben 0.0000000
#2 Bill 0.3333333
#3 Emma 0.6666667
#4 Jon 0.0000000
#5 Maria 0.3333333