I have a similar dataset to this:
> dput(df)
structure(list(Surgeon = c("John Smith", "John Smith", "John Smith",
"John Smith", "John Smith", "John Smith", "John Smith", "Martin Harris",
"Martin Harris", "Martin Harris", "Kyle Short"), Blood.Order = c("ABC",
"ABC", "DEF", "ABC", "IJK", "ABC", "DEF", "IJK", "ABC", "ABC",
"DEF"), Status = c("Returned", "Wasted", "Returned", "Returned",
"Wasted", "Wasted", "Wasted", "Returned", "Wasted", "Returned",
"Wasted")), class = "data.frame", row.names = c(NA, -11L))
I want to calculate how much blood (Blood.Order
) each surgeon wasted as a function of how many surgeries they performed.
For example, we see that John Smith
performed 7 surgeries. Out of these 7 surgeries, he wasted blood 4 times. So this calculation should be 4/7=0.5714286.
I want to create a loop that does this for each surgeon (find out how much blood each surgeon wasted per how many surgeries total they performed).
A bar graph showing how much blood each surgeon wasted would be helpful, to see which surgeon(s) waste the most blood.
Thanks!
CodePudding user response:
We can do this without a loop i.e. grouped by 'Surgeon', get the mean
of logical vector (Status == "Wasted"
)
library(dplyr)
out <- df %>%
group_by(Surgeon) %>%
summarise(Prop = mean(Status == "Wasted"))
-output
out
# A tibble: 3 × 2
Surgeon Prop
<chr> <dbl>
1 John Smith 0.571
2 Kyle Short 1
3 Martin Harris 0.333
If we need a bar plot
library(ggplot2)
ggplot(out, aes(x = Surgeon, y = Prop)) geom_col()
Or using base R
barplot(proportions(table(df[-2]), 1)[,2])