I am attempting to calculate the positivity rate per person, i.e. (# of 1s per person/total number observations per person). My data set looks similar to this:
person | outcome |
---|---|
a | 1 |
a | 1 |
a | 0 |
a | 0 |
b | 1 |
b | 0 |
b | 0 |
c | 1 |
c | 1 |
I am hoping to return something that looks like this:
person | positiverate |
---|---|
a | 0.50 |
b | 0.33 |
c | 1.00 |
I feel like this should be a fairly simple code, but I have been unable to figure it out thus far.
CodePudding user response:
We may use a group by mean
library(dplyr)
df1 %>%
group_by(person) %>%
summarise(positiverate = mean(outcome))
-output
# A tibble: 3 × 2
person positiverate
<chr> <dbl>
1 a 0.5
2 b 0.333
3 c 1
data
df1 <- structure(list(person = c("a", "a", "a", "a", "b", "b", "b",
"c", "c"), outcome = c(1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L)),
class = "data.frame", row.names = c(NA,
-9L))
CodePudding user response:
Base R:
aggregate( . ~ person ,df, mean)
# or if you prefere to have positiverate as column name
aggregate( cbind(positiverate = outcome) ~ person ,df, mean)
data.table
for faster data manipulation:
library(data.table)
setDT(df)[,'.'(positiverate = mean(outcome)), by = person]
CodePudding user response:
Try tapply():
tapply(X = df$outcome, INDEX = df$person, FUN=mean)